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Previous  studies  have  organized  alleles  of  the  Mhc  class 
II  Ab  gene  into  3  evolutionary  lineages  based  on  genomic 
structures.  The  major  distinction  between  lineage  1  and  2  is 
an  861  bp  retroposon  in  the  intron  separating  the  A^^  and  A^2 
exons  in  lineage  2  alleles.  By  using  this  retroposon  as  an 
evolutionary  tag,  we  have  extended  our  molecular  genetic 
studies  of  Ab  to  include  115  independently  derived  H-2 
haplotypes  from  12  separate  species  and  subspecies  of  genus 
Mus .  Ab  alleles  from  lineage  1  and  2  were  found  in  all  3 
aboriginal  species  (Mus  spretus.  Mus  spiceligus.  and  Mus 
spretoides)  and  in  Mus  carol i.  indicating  that  these  two 
lineages  of  Ab  alleles  diverged  a  minimum  of  2.5  million  years 
ago.  Parsimony  analysis  of  86  Ab  alleles,  using  restriction 
site  as  a  character  state,   indicated  that  lineage  3  alleles 


xi 


are  evolutionarily  more  closely  related  to  lineage  2  than  to 
lineage  1.  DNA  sequence  of  intron  2  from  an  evolutionary 
lineage  3  allele  was  determined.  The  data  indicated  that 
lineage  3  was  derived  from  a  lineage  2  allele  by  two 
additional  insertional  events  in  the  intron  2.  One  insertion, 
composed  of  Alu-like(Bl)  repeat,  occurred  508  bp  3 '  of  A^, 
exon.  By  using  the  polymerase  chain  reaction  and  restriction 
analysis,  a  lineage  2  allele  from  Mus  m.  musculus,  was 
identified  to  carry  that  Bl  insert,  thus  defining  new  lineage, 
2B.  The  other  insertion,  occurring  in  the  lineage  2 
retroposon,  starts  1141  bp  3 '  of  the  A^,  exon.  This  latter 
insertion  is  539  bp  in  length  and  is  composed  of  Alu-like 
repetitive  elements  and  unique  sequence.  In  summary,  the 
murine  Ab  genes  can  be  divided  into  4  distinct  evolutionary 
lineages,  1,  2A,  2B,  and  3,  which  are  produced  by  3 
independent  retroposon  insertions.  Lineage  3  alleles  were 
found  in  Mus  m.  musculus  and  Mus  m.  domesticus,  indicating 
that  lineage  3  as  well  as  2A  and  2B  diverged  a  minimum  of  0.5 
millions  years  ago.  These  results  indicate  that  all  4 
lineages  of  Ab  have  persisted  through  several  speciation 
events  in  the  genus  Mus . 
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CHAPTER  1 
INTRODUCTION 


The  I  region  of  the  murine  major  histocompatibility 
complex  (H-2)  contains  a  tightly-linked  cluster  of  highly 
polymorphic  genes  (class  II)  that  control  immune 
responsiveness.  Two  major  hypotheses  have  been  proposed  to 
account  for  the  origin  of  this  polymorphism,  which  is  believed 
to  be  essential  for  the  function  of  the  class  II  proteins  in 
immune  protection  of  host.  The  first  was  that  hypermutational 
mechanisms  (gene  conversion  or  segmental  exchange)  promote  the 
rapid  generation  of  diversity  in  Mhc  genes.  The  alternative 
was  that  polymorphism  arose  from  the  steady  accumulation  of 
mutations  over  long  evolutionary  periods,  and  that  multiple 
specific  alleles  commonly  survived  speciation  event  (trans- 
species  evolution  or  ancestral  polymorphism) .  In  a  previous 
study,  McConnell  et  al.  (1988)  used  restriction  fragment 
length  polymorphism  (RFLP)  and  sequence  analysis  to  seek 
evidence  of  "segmental  exchange"  and/or  "trans-species 
evolution"  in  the  class  II  genes  of  the  genus  Mus  by  a 
molecular  genetic  analysis  of  Ab  alleles.  This  study  detected 
31  Ab  alleles   in  a  collection  of  49  H-2  haplotypes  derived 
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from  5  separate  species  and  subspecies  in  the  genus  Mus. 
These  alleles  were  organized  into  3  evolutionary  lineages  on 
the  basis  of  retroposon  polymorphisms  occurring  in  the  intron 
(intron  2)  separating  the  exons  which  encode  the  pi  and  )82 
domains  of  Ab.  By  using  this  retroposon  sequence  as  an 
evolutionary  tag,  they  demonstrated  that  the  h/S  alleles  in  two 
of  these  lineages  diverged  at  least  0.5  million  years  ago  and 
that  alleles  from  both  lineages  survived  the  speciation  events 
leading  to  several  modern  Mus  species.  These  findings 
indicate  that  class  II  gene  polymorphisms  are  evolving  in  a 
trans-species  manner,  suggesting  that  the  extensive  diversity 
of  Mhc  class  II  genes  predominantly  reflects  the  steady 
accumulation  of  mutations  in  distinct  lineages  of  alleles 
which  are  selectively  maintained  in  natural  populations  for 
long  evolutionary  periods. 

In  this  dissertation,  we  address  two  additional  issues 
concerning  the  evolution  of  Ab  in  Mus .  The  first  issue 
concerns  the  evolutionary  origin  of  lineage  3.  What  is  the 
nature  of  the  retroposon  polymorphism  in  lineage  3  alleles  and 
was  lineage  3  derived  from  lineage  1  or  lineage  2  ?  If  so, 
what  kind  of  evolutioanry  mechanism  generated  lineage  3  ?  We 
have  addressed  this  issue  by  sequencing  a  3.8  kb  DNA  segment 
containing  intron  2  from  a  prototypic  lineage  3  allele.  The 
results  clearly  indicate  the  lineage  3  alleles  are  derived 
from  lineage  2  allele  by  two  additional  independent  retroposon 
insertions    in    intron    2 .       The    second    issue    concerns  the 


distribution  of  various  Ab  lineages  within  the  genus  Mus  and 
how  long  these  ^  lineage  have  persisted  in  the  genus  Mus. 
We  have  addressed  this  issue  by  expanding  the  RFLP  analysis 
to  include  115  independently-derived  H-2  haplotypes  derived 
from  12  separate  species  and  subspecies  of  genus  Mus .  A  total 
of  86  Ab  alleles  was  identified  from  this  analysis.  Parsimony 
analysis,  using  restriction  site  as  a  character  state,  was 
also  exploited  to  construct  the  evolutionary  trees  of  Ab 
alleles  to  determine  their  phylogenetic  relationships.  DNA 
sequence  and  restriction  enzyme  analysis  indicate  that  Ab 
genes  can  be  divided  into  4  distinct  evolutionary  lineages, 
which  are  generated  from  three  independent  insert ional  events. 
The  presence  of  various  lineages  in  different  species  and 
subspecies  of  Mus  further  the  idea  that  the  Mhc  genes  evolved 
in  a  trans-species  fashion  and  they  have  persisted  over  long 
evolutionary  timespans  in  genus  Mus. 


CHAPTER  2 

GENOMIC  ORGANIZATION  OF  MAJOR  HISTOCOMPATIBILITY  COMPLEX 


In  the  past  decade  our  understanding  of  the  major 
histocompatibillity  complex  has  advanced  dramatically  because 
of  the  application  of  both  monoclonal  antibody  techniques  and 
recombinant  DNA  technology.  Biologists  are  now  able  to 
characterize  one  of  the  most  fundamental  phenomena  of 
eukaryotic  biology — the  ability  of  organisms  to  discriminate 
between  self  and  nonself  in  molecular  terms.  Even  the  most 
primitive  of  metazoa,  the  sponges,  display  cell  surface 
recognition  systems  capable  of  discerning  and  destroying 
nonself,  probably  to  maintain  the  integrity  of  individuals 
surviving  in  densely  populated  environments  (Hildemann  et  al . 
1981) .       There    are    three    fundamental    features    about  this 

self/nonself    recogntion    systems  cell-surface  recognition 

structures,  effector  mechanisms  that  result  in  the  destruction 
of  nonself,  and  a  high  degree  of  genetic  variability  in  the 
recognition  structures  (Hood  et  al.   1983) . 

In  mammalian  genetic  systems,  a  chromosomal  region  termed 
the  Mhc  encodes   the   self/nonself   recognition   system  with 
similar  features.     Although  all  vertebrates  appear  to  posses 
a  homologous  Mhc,    it  has  been  most  extensively  studied  in 
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mouse  (H-2)  and  in  man  (HLA)  (Gotze  et  al.  1977)  .  The  Mhc  was 
first  identified  in  mice  (H-2)  because  of  the  availability  of 
inbred  and  congenic  strains  of  mice.  By  grafting  of  tumors 
or  skins  between  such  strains  of  mice  and  following  rejection 
or  acceptance  of  the  graft,  Gorer  and  others  (Gorer  et  al. 
1938,  1948)  were  able  to  map  the  rejection  phenomena  to  a 
region  on  chromosome  17,  which  was  then  denoted  the  Mhc.  In 
mouse  at  least  60  traits,  most  of  which  are  associated  with 
the  immune  response,  have  been  mapped  to  Mhc  using  the  classic 
genetic  techniques  (Klein  1975) . 

H-2  Complex 

Mhc  is  defined  as  a  group  of  genes  coding  for  molecules 
that  provide  the  context  for  the  recognition  of  foreign 
antigens  by  T  lymphocytes  (Klein  1983) .  "Context"  implies 
that  T  cells  do  not  recognize  antigen  alone;  but  instead 
recognizes  antigen  in  the  context  of  Mhc  molecules  on  the 
surface  of  antigen-presenting  cells.  Thus  far,  Mhc  genes  have 
been  found  only  in  vertebrates.  It  is  not  known  whether  all 
vertebrates  possess  Mhc,  but  so  far  it  has  been  identified  in 
twenty  vertebrate  species  (Klein  1986)  . 

Three  Classes  of  Mhc  Genes 

Traditionally,  the  Mhc  genes  can  be  divided  into  three 
classes,    I,   II  and  III.      Class  I  molecules  are  involved  in 


transplantation  rejection  and  T  -cell-mediated  cytotoxic 
killing.  Class  II  molecules  serve  as  restriction  elements 
during  the  presentation  and  processing  of  foreign  antigen  to 
regulate  the  immune  response.  Certain  complement  components, 
e.g.  C3  and  C4,  are  encoded  by  class  III  genes  within  the  Mhc 
complex.  However,  no  significant  homology  can  be  shown 
between  Mhc  genes  and  complement  genes,  and  although  the  C4 
genes  is  closely  linked  to  Mhc  in  many  species,  the  C3  genes 
are  only  loosely  linked  to  some  species,  but  not  in  other 
species  (Alper  1981).  Klein  et  al.  (1983)  have  argued  against 
the  inclusion  of  the  complement  genes  as  a  class  of  Mhc  genes. 

Organization  of  Mouse  Mhc 

The  H-2  complex  of  the  laboratory  mouse  is  the  only  Mhc 
in  which  nearly  all  of  the  loci  have  been  identified  and 
their  position  determined.  For  example,  the  molecular  map  of 
Mhc  genes  of  C57BL/10  (Weiss  et  al.  1984)  and  BALB/c 
(Steinmetz  et  al.  1982a;  Winoto  et  al.  1983)  haplotypes  have 
been  extensively  characterized.  From  the  centromeric  part  of 
the  Mhc  of  the  BALB/c  mouse,  a  600  Kb  segment  cluster  has  been 
cloned  containing  two  class  I  (K  and  K2)  and  seven  class  II 
genes   (Pb(A^3)   to  Ea)    (Steinmetz  et  al.   1986)    (Figure  2-1)  . 
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Following  a  gap  of  about  170  kb,  a  second  gene  cluster  of 
330  kb  in  length  has  been  cloned  from  the  S  region  containing 
( C4 .  Sip.  Bf ,  C2)  coding  for  complement  or  related  components 
and  two  homologous  genes  (21-OHA  and  21-OHB) ,  one  of  which 
encodes  for  steroid  21-hydroxylase  (Muller  et  al.  1987) .  A 
third  gene  cluster  covering  500  kb  of  DNA  has  been  isolated 
from  the  D  and  Qa  regions  and  localizes  the  positions  of  13 
class  I  genes  (D  to  01-10)  (Stephan  et  al.  1986) ,  the  TNF-a  and 
-/3  genes  coding  for  cytotoxins  (Muller  et  al.  1987b)  .  From 
the  Tla  region,  a  total  of  19  class  I  genes  are  distributed 
in  3  gene  clusters.  In  summary,  the  Mhc  complex  of  the  BALB/c 
mouse  contains  50  loci,  of  which  34  loci  are  class  I  and  7  are 
class  II  genes  (Steinmetz  &  Uimatsu  1987)  .  Whereas  in  the  Mhc 
of  C57BL/10  mouse,  26  class  I  genes  have  been  identified,  of 
which  10  genes  are  in  the  Qa2  ^  3  regions  and  13  genes  in  the 
TL  region  (Flavell  et  al.  1985) .  Among  3  H-2  haplotypes  (b, 
d  and  k)  analyzed  thus  far  (b,  d  and  k) ,  the  K  and  the  class 
II  regions  show  no  large  differences  in  organization  (Klein 
&  Figueroa  1986) . 

Genetic  loci  of  class  I  gene 

There  are  two  class  I  genes  (H-2K  and  H-2K1)  at  the 
centromeric  end  of  the  H-2  region;  all  the  remaining  genes  are 
at  the  telomeric  end.  The  class  I  loci  can  be  divided  into 
two  subclasses:  I-a,  consisting  of  loci  with  a  known  function 
(H-2K.  H-2D,  H-2L)  and  I-b,  consisting  of  the  remaining  loci 
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whose  functions  are  largely  unknown.  The  class  II  loci  and 
a  group  of  unrelated  loci  including  genes  coding  for 
complement  components  are  inserted  between  two  H-2K  loci  and 
the  rest  of  class  I  loci  (Figure  2-1) .  The  class  I  loci  can 
be  assigned  to  one  of  four  regions:  K,  D,  Qa  and  Tla, 
depending  on  their  position,  this  division  only  in  part 
reflects  the  evolutionary  relationships  among  the  individual 
loci    (Klein    &    Figueroa    1986) .  Class    I  transplantation 

antigen  are  found  on  virtually  all  nucleated  cells  of  the 
mouse.  The  cell  surface  antigens  encoded  in  Oa-2 . 3  and  Tla 
region  can  be  further  distinguished  from  classical  class  I 
antigen  because  they  are  less  polymorphic  and  more  limited  in 
tissue  distribution  than  K  or  D-encoded  antigens  (Flaherty  et 
al.   1980) . 

Class  I  gene  structure.  The  exon-intron  organization  of 
class  I  genes  are  remarkably  similar  to  each  other.  Each 
class  I  gene  is  composed  of  8  exons,  which  correlates 
precisely  with  the  domain  structure  of  class  I  polypeptide 
(Figure  2-2)  (Steinmetz  et  al.  1981;  Nathenson  et  al.  1981). 
The  first  exon  encodes  the  leader  peptide,  the  second,  third, 
and  fourth  exons  encode  the  al,  a2  and  a3  domains.  The  fifth 
exon  encodes  the  transmembrane  region,  and  the  sixth,  seventh, 
and  eighth  exons  encode  the  cytoplasmic  domain  and  3 • 
untranslated  region  (Figure  2-2) . 
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Class  I  polypeptide.  Class  I  protein  has  a  mol.  wt.  of 
45,000  daltons  and  is  a  transmembrane  glycoprotein 
noncovalently  associated  with  ^2~roicroglobulin  (/32m)  ,  a 
12 , 000-dalton  polypeptide  encoded  by  a  gene  located  on 
chromosome  2  in  the  mouse  (Coding  et  al.  1981;  Michaelson  et 
al.  1981;  Robinson  et  al.  1981) .  Amino  acid  sequence  analyses 
have  demonstrated  that  class  I  antigen  can  be  divided  into  5 
domains  (Coligan  et  al.  1981) .  The  three  external  domains, 
al,a2  and  a3,  are  each  about  90  residues  in  length.  The 
transmembrane  portion  is  about  4  0  residues  and  the  cytoplasmic 
region  is  about  3  0  residues  long.  The  a2  and  a3  domains  have 
a  centrally  placed  disulfide  bridge  spanning  about  60  residues 
and  up  to  three  N-linked  glycosyl  units  bound  to  these  domains 
(Maloy  et  al.  1982) .  Amino  acid  sequence  analyses  also 
suggest  that  the  a3  domain  (Strominger  et  al.  1980)  and  /32- 
microglobulin  (Peterson  et  al.  1972)  show  strong  sequence 
homology  to  the  constant  region  domains  of  immunoglobulins. 
Binding  studies  from  class  I  molecules  with  peptide  fragments 
have  shown  that  the  /32m  subunit  associated  with  the  a3  domain 
(Yokoyama  et  al.   1983) . 

Three  dimensional  model  of  class  I  molecules.  Recently, 
a  three  dimensional  structure  of  human  class  I  molecule  HLA- 
A2  was  studied  by  X-ray  crystallographic  analysis  (Bjorkman 
et  al.  1987a,  b) .  Soluble  HLA-A2  was  purified  and 
crystallized  after  papain  digestion  of  plasma  membranes  from 
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a  homozygous  human  lymphoblastoid  cell  line.  Papain  treatment 
yields  a  molecule  composed  of  otl,  a2 ,  a2  and  )02m.    This  class 

I  molecule  consists  of  two  pairs  of  structurally  similar 
domains:  al  has  the  same  tertiary  fold  as  a2  ,  likewise  a3  has 
the  same  tertiary  fold  as  /32m.  The  a3  and  )32m  both  have  )3- 
sandwich  structures  composed  of  two  antiparallel  /3-plated 
sheets,  one  with  four  j0-strands  and  one  with  three  )8-strands, 
connected  by  a  disulphide  bond.  The  same  tertiary  structure 
has  been  shown  for  constant  region  of  immunoglobulin  and  is 
consistent  with  high  degree  of  sequence  homology  between  a3, 
/32m  and  constant  region.  The  structurally  similar  al  and  a2 
domains  are  paired,  with  the  four  /8-strands  from  each  domain 
forming  a  single  antiparallel  /3-sheet  with  eight  strands. 
This  particular  intramolecular  "dimeric  interaction" 
(McLachian  et  al.  1980)  seen  between  al  and  a2,  involving  the 
creation  of  a  single  )9-sheet  from  two  domains,  has  been 
observed  in  many  inter-molecular  dimers,  and  has  been  proposed 
to  be  preserved  in  an  intermolecular  dimer,  such  as  Mhc  class 

II  molecules  (Bjorkman  et  al.   1987b) . 

Antigen  binding  site  of  class  I  molecule.  Several 
observations  suggest  that  the  groove  between  al  and  a2  helices 
is  the  antigen  binding  site  (ABS)  (Bjorkman  et  al.  1987b) . 
It  is  located  in  a  position,  distal  from  the  membrane  end  of 
the  molecule,  capable  of  being  recognized  by  receptors  of 
another  cells.     The  site,   -25  A  long  by  10  A  wide  by  11  A 
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deep,  has  a  size  and  shape  consistent  with  the  expectation. 
By  analogy  with  class  II  molecules,  class  I  molecules  bind 
processed  antigen  in  a  form  of  peptides.  Synthetic  peptides 
have  been  shown  to  bind  to  purified  murine  class  II  molecules, 
presumably  mimicking  processed  antigen  (Guillet  et  al.  1986) . 
Because  class  I  and  class  II  molecules  have  homologous 
structures  (Kaufman  et  al.  1984)  and  T  cells  specific  for 
either  class  I  or  II  molecules  use  the  same  receptors  (Rupp 
et  al.  1985;  Marrack  &  Kappler  1986),  the  type  of  interaction 
described  between  peptides  and  class  II  molecule  is  assumed 
to  apply  to  peptides  and  class  I  molecules.  Electron  density 
representing  an  unknown  molecule,  possibly  a  bound  peptide 
antigen,  is  found  in  the  site  of  two  crystal  forms  of  HLA-A2 
class  I  molecules  (Bjorkman  et  al.  1987b) .  An  a-helical 
conformation  has  been  proposed  for  bound  peptide  (Berkower  et 
al.  1986;  Allen  et  al.  1987) .  Thus,  one  face  of  a  peptide  a- 
helix  is  envisioned  to  contact  the  class  II  molecule,  the 
other  to  be  contacted  by  T  cell  receptor.  Many  of  the 
polymorphic  residues  that  are  responsible  for  recognition  by 
T  cells  and  haplotype-specif ic  association  with  antigens  are 
located  in  this  site  where  they  could  serve  as  ligands  to  a 
processed  antigen.  This  is  further  evidence  that  this  region 
functions  as  antigen  binding  site  (Bjorkman  et  al.  1987b) . 
Most  of  non-conserved  residues  are  located  in  and  around  the 
ABS  site,  suggesting  that  most  variable  residues  in  class  I 
molecules  have  been  selected  to  generate  an  ability  to  present 
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many  different  peptides.  It  is  also  noted  that  some  of 
conserved  amino  acid  residues  are  located  in  the  ABS, 
suggesting  that  they  may  recognize  a  constant  feature  of 
processed  antigens,  consistent  with  the  previous  suggestions. 

Genetic  Organization  of  the  I  Region 

In  the  past  the  I  region  had  been  divided  into  five 
subregions  by  serological  and  functional  analysis  of 
recombinant  H-2  haplotypes;  these  are:  I-A.  I-B.  I-J.  I-E  and 
I-C  (Murphy  1981;  Klein  et  al.  1981;  Klein  et  al.  1983).  The 
subregions  are  defined  by  crossover  positions  in  H-2 
recombinant  strains.  However,  so  far  only  four  I  region- 
associated  (la)  products  have  been  identified  by  both 
serological  and  biochemical  analysis  (Jones  1977;  Uhr  et  al . 
1979) .  Failure  to  identify  gene  products  encoded  by  I-B.  I- 
J,  and  I-C  subregions  was  further  explained  as  follows: 

I-B  subregion 

The  existence  of  a  separate  I-B  subregion  was  initially 
proposed  by  Lieberman  and  coworkers  (1972)  to  explain  the 
genetic  control  of  antibody  response  to  a  myeloma  protein. 
The  involvement  of  the  I-B  subregion  was  later  postulated  for 
immune  responses  to  at  least  five  other  antigens:  lactate 
dehydrogenase  B  (LDHg)  (Melchers  et  al.  1973) ,  staphylococcus 
nuclease  (Lozner  et  al.  1974) ,  oxazolone  (Fachet  et  al.  1977) , 
the     male-specific     antigen     (Hurme     et     al.      1978) ,  and 
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trinitrophenylated  mouse  serum  albumin  (Urba  et  al.  1978) . 
In  all  these  cases  the  mapping  of  genes  controlling  the  immune 
response  centered  around  the  four  critical  H-2  haplotypes,  i.e. 
BIO(A)      fH-2')  .      C57BL/10  .      B10.A(4R)  .  and 

B10.A(5R)  (H-2") ,  used  by  Lieberman  and  her  co-workers. 
However,  further  analysis  by  Baxemanis  et  al.  (1981)  of  the 
response  to  LDHg  and  to  myeloma  protein  MOPC173  revealed  the 
involvement  of  and  cells  in  response  to  these  antigens, 
making  the  postulate  of  a  separate  I-B  subregion  unnecessary. 

I-J  subregion 

This  locus  was  originally  defined  serologically  and 
mapped  between  I -A  and  I-E  by  reciprocal  alloantisera  raised 
between  strains  B10.A(3R)  and  B10.A(5R),  which  are  inbred 
congenic  recombinant  strains  with  a  crossover  between  I -A  and 
I-E  subregions  (Murphy  et  al.  1978a,  1978b) .  Alloantisera  and 
monoclonal  antibodies  raised  against  I-J-encoded  molecules 
react  with  determinants  expressed  on  suppressor  T  cells,  and 
the  soluble  suppressor  T  cell  factors  released  by  these  cell 
lines  (Krupen  et  al.  1982) .  There  is  a  lot  of  experimental 
data  available  supporting  the  existence  of  I-J  locus  (Murphy 
et  al.  1978a;  Waltenbaugh  et  al.  1981)  .  However,  its  true 
identity  and  chromosomal  location  remain  elusive.  By  using 
restriction  fragment  polymorphisms  (RFLP)  to  map  the  crossover 
points  among  inbred  congenic  mouse  strains  that  have 
recombination  events  between  I -A  and  I-E  loci,  I-J  subregion 
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was  mapped  to  a  3.4  kb  segment  of  DNA  between  I -A  and  I-E. 
including  3'  half  of  Eb  gene  (Steinmetz  et  al.  1982). 
Molecular  cloning  of  this  3.4  Kb  region  from  ten  parental  and 
intra-I  recombinant  inbred  strains  have  narrowed  the  distance 
between  cross  points  separating  I -A  and  I-E  to  2.0  kb, 
contained  entirely  within  the  intron  between  E^i-E^2  ^^'^  ^pz 
exon  of  Eb  gene  (Kobori  et  al.  1984)  .  Although  a  lot  of 
explanations  have  been  put  forth  to  account  for  the  apparent 
paradox  of  I-J.  all  of  them  are  refuted  by  experiments 
showing  that  cloned  DNA  of  this  region  fails  to  hybridize  to 
mRNA  isolated  from  I-J^  suppressor  T  cell  lines  (Kronenberg 
et  al.   1983) . 

I-C  subreqion 

This  subregion  was  defined  by  the  la. 6  specificity, 
detected  as  a  cytotoxic  antibody  present  in  B10.A(4R)  (H-2'^^) 
anti-B10A(2R)  (H-2^'')  antiserum  (Sandrin  et  al.  1981).  These 
antisera  containing  purported  anti-I-C  antibodies  were  shown 
to  react  with  a  suppressor  factor  generated  in  a  mixed 
lymphocyte  reaction  (MLR)  (Rich  et  al.  1979;  Rich  et  al. 
1979) .  A  MLR  that  is  generated  in  congenic  strain  combination 
differing  at  the  I-C  subregion  can  be  inhibited  by  the 
addition  of  anti-I-C  antisera  (Okuda  et  al.  1978) .  Mapping 
by  classic  genetic  methods  has  suggested  a  locus  in  the  I-C 
subregion  between  Ea  and  the  gene  coding  for  the  C4  complement 
components.       Although    this    segment    of    DNA   has    not  been 
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characterized  using  molecular  techniques,  the  data  available 
do  not  lend  support  for  the  existence  of  I-C.     Others  have 
never  been  able  to  demonstrate  any  activity  in  I-C-def ininq 
anti  combination    by    serological    methods,  MLR, 

graft-versus-host  reaction,  or  cell-mediated  lympho- 
cytotoxicity  (CML)  assays  (Juretic  et  al.  1981;  Livnat  et  al. 
19V3).       ■>  . 

Linkage  Relationship  of  Class  II  Genes 
Class  II  gene  loci 

Chromosomal  walking  through  the  I  region  by  the  ordering 
of  overlapping  cosmid  clones  (Steinmetz  et  al.  1982a)  as  well 
as  genetic  mapping  of  restriction  fragment  length 
polymorphisms  (Mathis  et  al.  1983;  Hood  et  al.  1983),  has 
allowed  the  chromosomal  localization  of  the  loci  encoding  the 
four  functional  defined  class  II  genes.  A  continuous  stretch 
of  about  500  kb  of  DNA  encompassing  the  I  region  was  first 
isolated  by  screening  a  BALB/c  sperm  cosmid  library  with  a 
human  Mhc  class  II  DRA  cDNA  probe  (Steinmetz  et  al.  1982a) . 
This  500  kb  region  of  DNA  includes  the  right  end  of  I  region, 
as  the  complement  component  C4  gene  mapping  into  the  S  region 
,  can  be  identified  (Figure  2-1) .  C4  gene  is  located  a  few 
hundred  kb  distal  to  the  Ea  gene  and  was  identified  by  a 
synthetic  oligonucleotide  probe  specific  for  the  amino- 
terminal  of  C4a  subunit.  Five  class  II  genes,  Aa,  Ab,  Eb, 
Eb2,  and  Ea  extending  over  a  90  kb  region  of  DNA,  have  been 


identified.  Ab,  Aa  and  Ea  were  identified  by  DNA  sequence 
analysis,  and  Eb  was  identified  by  a  specific  oligonucleotide 
probe.  Eb2  was  identified  by  cross-hybridization  with  a  human 
DRA  cDNA  probe  and  mouse  Eb  gene.  The  identity  of  Eb  gene  was 
confirmed  by  comapping  via  RFLP  analysis  which  localizes  a 
serologically  defined  Eb  recombinant  in  the  middle  of  Eb  gene 
(reviewed  by  Hood  et  al.  1983) .  Southern  blot  analysis  of 
mouse  genomic  DNA  with  class  II  probes  suggested  that  class 
II  genes  are  single  copy  and  that  there  are  no  more  than  two 
a  genes  and  six  ^  genes  in  the  mouse  genome  (Steinmetz  et  al. 
1982a;  Devlin  et  al.  1984)  .  All  the  known  class  II  loci  are 
contained  in  a  tightly-linked  cluster,  inserted  between  the 
H-2K  and  C4  genes.  This  cluster  contains  4  functional  genes 
and  4  pseudogenes,  which  are  further  divided  into  two 
subclasses,  I -A  and  I-E.  The  eight  class  II  genes,  Pb  (A^j)  , 
Qh  (A^2)  r  Ab,  Aa,  Eb,  Eb2,  Ea,  and  Eb3 ,  are  arranged  in  this 
order  from  the  centromeric  towards  the  telomeric  end 
(Steinmetz  et  al.  1982a;  Davis  et  al.  1984;  Larhammar  et  al. 
1983;  Widera  et  al.  1985)  (Figure  2-1  &  Figure  2-3).  Out  of 
the  eight  genes,  only  four  are  have  been  shown  to  encode  gene 
products,  Aa  coupled  with  to  form  I -A  molecules,  E^  with  E^ 
to  form  I-E  molecules  (Jones  et  al.  1978;  Uhr  et  al.  1979). 
The  Ob  and  Eb2  genes  are  reported  to  be  transcribed,  but  at 
very  low  levels  and  have  no  detectable  protein  product  (Wake 
&  Flavell  1986)  .  The  Pb  gene  is  a  pseudogene,  at  least  in  the 
b  and  k  haplotypes,  as  it  has  a  deletion  of  eight  nucleotides 
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and  a  termination  codon  in  its  sequence.  The  Eb3  thus  far  has 
been    found   only    in   the  haplotype,    but   probably  also 

exists  in  other  haplotypes  (Flavell  et  al.  1985b) .  All 
haplotypes  studied  thus  far  contain  these  class  II  genes.  The 
distances  between  these  genes  are,  with  a  few  exceptions, 
approximately  the  same  in  different  haplotypes. 

Biochemistry  of  Class  II  Molecules 

Up  to  now  only  four  I  region-associated  (la)  products 
have  been  identified  by  both  serological  and  biochemical 
methods.  The  I -A  subregion  contains  3  loci  that  encode  three 
serologically  detectable  polypeptides:  A^,  A^,  and  (Jones 
et  al.  1978) .  I-E  subregion  contains  a  locus  that  encodes  a 
fourth  class  II  polypeptide  chain,         (Uhr  et  al.   1979) . 

Structure  of  class  II  ploypeptides 

The  two  class  II  molecules  encoded  in  the  I -A  and  I-E 
subregions  are  both  heterodimeric  glycoproteins  composed  of 
one  heavy  (a)  and  one  light  ()3)  chains  (Figure  2-3  and  Figure 
2-4).  The  a  chains  range  in  molecular  weight  from  30,000  to 
33,000  and  the  ^  chains  range  in  molecular  weight  from  27,000 
to  29,000.  The  difference  in  molecular  weight  of  a  and  (3 
chain  is  due  to  an  extra  N-linked  glycosyl  unit  attached  to 
a  chain  (reviewed  by  Klein  et  al.  1983)  .  The  structure  of  the 
class  II  polypeptides  have  been  determined  in  a  number  of 
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studies  (McNicholas  et  al.  1982;  Mathis  et  al.  1983a;  Malissen 
et  al.  1984;  Benoist  et  al.  1983;  Larhaitunar  et  al.  1983; 
Estess  et  al.  1986) .  The  sequence  data  available  suggest  that 
the  mouse  I -A  and  I-E  molecules  are  homologous  to  human  DQ  and 
PR  class  II  genes,  respectively  (McNicholas  et  al.  1982; 
Malissen  et  al.  1983a;  Larhammar  et  al.  1983)  .  Each  class  II 
molecule  consists  of  two  extracellular  domains,  al  and  a2  or 
131  and  /32,  each  about  90  residues  in  length,  a  transmembrane 
region  of  about  3  0  residues,  and  a  cytoplasmic  tail  of  about 
10-15  residues.  Three  of  the  four  extracellular  domains  (a2, 
)91  and  )82)  have  a  centrally  placed  disulfide  bridge  spanning 
about  60  amino  acid  residues,  while  the  al  does  not.  The 
membrane  proximal  domains  of  both  a  and  /3,  like  that  of  class 
I  molecules,  show  strong  homology  to  immunoglobulin  constant- 
region  domains.  In  this  respect,  the  class  I  and  class  II 
molecules  are  very  similar  to  each  other  in  overall 
organization  and  domain  structure.  For  each  of  the  two 
polypeptide  chains  of  class  II  molecules,  a  and  13  chains,  the 
polymorphic  residues  are  concentrated  in  the  al  and  )91  amino- 
terminal  domains  (Benoist  et  al.  1983;  Larhammar  et  al.  1983). 
These  domains  are  responsible  for  binding  peptides  in  what 
appears  to  be  a  single  site.  By  aligning  the  sequences  of 
class  II  a  and  )3  chains  with  the  class  I  heavy  chain  by 
matching  the  al  and  131  domains  of  class  II  with  the  al  and 
a2  of  class  I,  a  hypothetical  tertiary  structure  for  class  II 
molecules  has  been  proposed  (Brown  et  al.   1987)  (Figure 
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2-5) .  The  folding  of  the  class  II  molecule  resembles  that  of 
class  I,  in  that  two  a  helices  are  supported  by  an  array  of 
eight  /3-plated  sheets  (Brown  et  al.  1988)  .  The  recent  results 
of  Perkins  et  al.  (1989)  showing  that  peptides  presented  by 
class  I  molecules  can  be  presented  by  class  II  molecules,  and 
vice  versa,  support  the  notion  that  the  structures  of  peptide- 
binding  sites  are  similar  in  class  I  and  class  II. 

Structures  of  class  II  genes 

There  is  a  striking  correlation  between  the  gene 
organization  and  domain  structure  of  Mhc  class  II  molecules 
(Figure  2-4)  .  Both  a  and  /3  genes  begin  with  leader-encoding 
exons  that  contains  3-6  residues  of  the  mature  proteins.  Exon 
2  and  3  encode  al  or  )31  and  a2  or  /32  domains,  respectively. 
p  genes  have  three  exons  encoding  TM,  CY,  and  3'UT  region, 
while  a  genes  have  TM,  CY,  and  the  beginning  of  3'UT  regions 
in  exon  4,  and  the  rest  of  3'UT  region  in  exon  5  (Larhammar 
et  al.   1983;  Estess  et  al.  1986). 

Analysis  of  the  Structure-Function  Relationship  of  Class  II 
Molecule 

The  application  of  DNA-mediated  gene  transfer  (DMGT)  has 
been  a  major  advancement  in  the  analysis  of  structure  and 
function  relationships  of  Mhc  gene  products.  Particularly, 
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DMGT  has  provided  insight  into  the  actual  biochemical  bases 
of  immune  recognition  and  regulation,  which  are  highly 
dependent  on  the  fine  structure  of  Mhc-encoded  products  and 
T  cell  receptors  with  which  they  interact. 

Regulation  of  class  II  gene  expression 

The  expression  of  class  II  genes  is  normally  limited  to 
a  number  of  tissues  (Klein  1986) .  Cell  surface  expression  of 
class  II  is  positively  regulated  by  the  addition  of  gamma 
interferon  (King  &  Jones  1983) .  Gamma  interferon  can  increase 
both  class  I  and  class  II  gene  expression  (King  &  Jones  1983) . 
It  appears  to  act  at  the  level  of  transcription,  since  the 
surface  expression  is  correlated  with  the  level  of  specific 
mRNA  (Nakamura  et  al.  1984)  .  Initial  studies  on  class  II  gene 
expression  following  transfection  were  performed  using  cells 
that  either  constitutively  expressed  (B  lymphoma)  or  were 
inducible  (macrophage  cell  lines)  for  endogenous  class  II 
genes  (reviewed  by  Germain  &  Malissen  1986)  .  Introduction  of 
the  genomic  copies  of  mouse  class-II  genes  into  B-lymphomas 
resulted  in  high  levels  of  gene  transcription  and  the 
expression  of  gene  products  of  the  transfected  genes  on  the 
cell  surface  (Ben-Nun  et  al.  1984) .  However,  it  was  difficult 
to  assign  the  observed  effect  in  serologic  or  T  cell 
restriction  element  to  the  introduced  gene  products.  The 
assembly  of  a  variety  of  class  II  molecules  following  the 
introduction  of  a  and/or  )S  chains,  prevented  the  dissection 
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of  which  introduced  chain  caused  the  phenotypic  traits.  la' 
mouse  fibroblast  L  cell  lines  derived  from  the  original  L-cell 
line  of  C3H  fibroblasts  have  been  used  for  a  variety  of  gene 
transfer  studies.  Using  cosmid  clones  containing  the  complete 
DRA  and  DRB  genes,  L  cells  were  first  demonstrated  to  express 
the  class  II  molecules  by  Rabourdin-Combe  &  Mach  (1983) .  No 
expression  was  seen  when  either  DRA  gene  or  DRB  gene  was 
introduced  separately.  This  is  consistent  with  the 
suggestion  that  a:^  pairing  is  required  for  the  efficient 
cell-surface  expression  of  Mhc  class  II,  although  one 
recombinant,  A.TFR5  (I-A^.  Ea*")  has  been  suggested  to  express 
a  free  chain  on  the  cell  surface  (Begovich  et  al.  1985) . 
Their  observations  were  confirmed  by  studies  of  Malissen  and 
coworkers  (1984)  and  Norcross  et  al.  (1985)  with  mouse  class 
II  genes.  In  both  studies,  transfection  of  either  a  or  ^ 
chain  gene  alone  failed  to  lead  to  the  membrane  expression, 
whereas  the  cotransfection  of  the  A^iA^  pairs  derived  from  the 
same  haplotypes  (e.g.  A^'^A^'',  A^'^A^'')  resulted  in  significant 
surface  expression.  These  results  agree  with  those  obtained 
using  la*  recipient  cells,  in  that  the  independent  transfer  of 
a  or  /3  chain  genes  result  in  the  expression  only  through 
pairing  with  the  endogenous  complementary  class  II  gene 
products  (Ben-Nun  et  al.  1984) .  However,  one  should  be 
cautious  about  the  view  that  a:  13  heterodimers  are  required  for 
the  surface  expression,  as  most  of  the  monoclonal  antibodies 
used  for  the  detection  of  membrane  molecules  have  not  been 
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shown  to  react  with  single  a  or  /3  chains,  which  presumably 
would  assume  a  different  configuration  as  single  chains  from 
when  paired  with  the  other  complementary  chain.  Thus,  the 
surface  expression  of  isolated  a  or  /3  chain  might  be 
undetectable  using  standard  reagents.  However,  additional 
experiments  are  also  consistent  with  a  lack  of  surface 
expression  of  free  a  or  ^  chains.  McCluskey  et  al .  (1985) 
compared  the  surface  expression  of  AB^  chain  gene  in  L  cells 
to  membrane  expression  of  a  chimeric  classll : classi  gene.  The 
latter  chimeric  molecule  is  composed  of  A^^''  domain  covalently 
linked  to  the  a3,  TM  and  CY  portion  of  class-I-D''  molecule. 
Following  transf ection,  the  expression  of  the  chimeric  gene 
can  be  detected  with  both  anti-I-A*'  and  anti-a3  (D*^)  monoclonal 
antibodies.  The  same  anti-I-A*'  antibodies  failed  to  detect 
the  surface  expression  of  L  cells  transfected  only  with  the 
native  A^*^  chain  gene  and  shown  to  contain  the  high  level  of 
Ab*^  mRNA.  This  pair  of  cells  was  also  analyzed  using  rabbit 
anti-I-A  heteroantiserum,  which  has  been  shown  to  precipitate 
free  A^  chain  from  a  reticulocyte  lysate  in  vitro  translation 
product  (Robinson  et  al.  1983)  and  to  detect  both  A^,  and  A^ 
polypeptides  in  western  blots  (Germain  &  Malissen  1986) . 
Again,  the  cells  containing  the  chimeric  gene  stained,  but  the 
cells  containing  the  native  A^*"  gene  alone  did  not.  These 
results  indicate  that  single  a  or  B  chain  do  not  reach  cell 
surface  efficiently  and  further  imply  that  the  A^^  domain  per 
se  does  not  prevent  surface  expression. 
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Dispensability  of  I-E  molecules 

It  has  been  estimated  that  some  2  0%  of  wild  mouse 
populations  do  not  express  I-E  molecules  (Gotze  et  al.  1981) . 
Laboratory  inbred  mouse  strains  of  b,  s,  f,  and  q  haplotypes 
fail  to  express  serologically  detectable  I-E  molecules  (Jones 
et  al.  1981)  .  The  defect  in  mice  of  b  and  s  haplotypes  is  due 
to  a  deficiency  of  £<,  chains;  E<,  polypeptide  is  undetectable 
in  the  cytoplasm  while  the  normal  amount  of  cytoplasmic  E^ 
chains  can  be  visualized  by  2-D  gels  (Jones  et  al.  1981)  .  The 
expression  defect  of  these  strains  can  be  complemented  by 
crossing  b  or  s  haplotypes  with  Ea-expressing  strains,  which 
results  in  normal  expression  of  hybrid  I-E  molecules  in  Fl 
hybrids  (Jones  et  al.  1981)  .  However,  neither  E^,  nor  E^  chain 
can  be  detected  in  cytoplasm  of  f-  and  q-haplotype  mice, 
because  of  defective  processing  of  both  Ea  and  Eb  mRNA  (Mathis 
et  al.   1983;  Tacchini-Cottier  et  al.  1988). 

Combinatorial  association 

L  cells  have  also  been  used  to  examine  the  issue  of 
allelic  control  of  a:  (3  pairing  and  restriction  on  cross- 
isotype  a: ft  assembly.  Initial  studies  by  Fathman  &  Kimoto 
(1981)  and  Silver  et  al. (1980)  suggested  that  la*  cells  from 
heterozygous  individuals  contain  a  mixture  of  la  molecules 
derived  from  the  free  assortment  of  allelic  a  and  ^  chains  of 
a  single  isotype  in  all  possible  combinations.  Thus,  in  (H- 
2"  x  Hzl*^)?!  mice,  one  would  find  A«%\  4.%%  A^^^  ^^d  A^'^A^'' 


heterodimer  in  approximately  equivalent  proportions.  Such  a- 
and  p-  chain  mixing  within  an  isotype  did  not  seem  to  occur 
between  distinct  isotypes  (i.e.  A^iE^)  .  However,  during 
attempts  to  develop  cell  lines  expressing  only  Fl-type  la 
molecules  (e.g.  A^'^A^'')  ,  it  was  found  that  although  haplotype- 
matched  A^rA^  pairs  yield  high  expression  in  primary 
transfectants,  cotransf ection  of  haplotype-mismatched  pairs 
gave  little  or  no  expression  (Germain  et  al.  1985) .  This  was 
true  even  though  the  genes  used  for  the  matched  or  mismatched 
gene  pairs  were  identical,  and  despite  the  presence  of 
detectable  Aa  and  Ab  mRNA  in  the  nonexpressing  cells. 
Additional  experiments  revealed  that  for  genes  of  b,  d  and  k 
haplotypes,  cis-chromosomal  a:)8  pairs  (e.g.  ha^Ap")  always 
gave  better  expression  than  trans-pairs  (e.g.  A^'^A^'*)  ; 
experiments  also  indicated  that  the  expression  of  the  latter 
varied  over  a  wide  range,  depending  on  the  particular  allelic 
forms  of  a  and  )3  employed.  Furthermore,  Aj'Afi-  and  A^'^A^'' 
molecules,  the  basis  for  previous  suggested  "free  pairing", 
are  the  best  expressed  haplotype-mismatched  mixes,  whereas 
ha^h/  has  never  been  detected.  In  order  to  map  the  region  of 
the  A^  molecules  controlling  the  preferential  pairing, 
recombinant  A^  molecule  involving  the  b,  d  and  k  alleles  were 
constructed.  The  entire  A^^  domain  was  exchanged  between 
different  alleles,  or  the  amino-half  of  A^^  was  covalently 
linked  to  the  carboxyl-half  of  A^^  and  various  A^2/  ™  and  CY 
regions.     These  "domain  and  hemi-domain  shuffled"  Ab  genes 
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were  independently  cotransfected  with  Aa^''^'"'^  ^  into  L  cells. 
Their  results  indicate  that  the  most  important  portion  of  Ab 
with  respect  to  az^  pairing  is  in  the  amino-half  of  A^^,  in 
that  molecules  containing  this  region  from  a  given  allele 
expressed  best  with  cis-matched  Aa  and  at  levels  similar  to 
wild  type  Ab,  irrespective  of  the  origin  of  the  remainder  Ab 
gene.  However,  when  isotype-dif  ferent  a:)9  pairs  were 
cotransfected  into  L  cells,  the  results  were  quite  unexpected. 
Although  introduction  of  Ab*"  and  Ea^"'  yield  no  surface  la 
detectable  with  either  anti-Ab  or  anti-E  antibodies,  Ab**  did 
pair  with  Ea  to  produce  membrane  molecules  reactive  with  anti- 
I-Ab**  and  anti-I-Ea  antibodies.  Immunoprecipitation  studies 
showed  that  these  molecules  existed  as  noncovalently 
associated  dimers  (Germain  &  Quill  1985) .  These  data  support 
the  view  that  Aa  and  Ab  genes  located  on  the  same  chromosome 
actually  coevolve  for  best  "fit",  such  that  cis-pairs  form 
more  efficiently  than  trans-pairs  (Figure  2-6) .  This  view  is 
further  supported  by  the  studies  of  McNicholas  et  al.  (1982), 
showing  that  an  8-10  fold  preference  of  E^":!^"  assembly  over 
Ea":E^''  in  cells  of  (B10.A(4R)  x  BIO. PL)  mice.  The  data  on 
cross-isotype  molecules  indicate  control  of  a:  13  pairing  is 
strongly  influenced  by  the  highly  polymorphic  amino  termini. 
To  evaluate  the  relative  efficiency  of  inter-  versus 
intraisotypic  la  dimer  expression,  L  cells  were  sequentially 
transfected  with  multiple  class  II  a  and  ^  chain  genes 
(Germain  &  Sant  1989) .     Then  individual  clones  were  analyzed 
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both  for  the  level  mRNA  expression  produced  by  transfected 
genes  and  for  their  expression  of  inter-  and  intra-dimer  at 
the  surface.  In  three  gene  transfection  system  (e.g.,  Ab,  Ea, 
and  Eb)  ,  it  was  found  that  isotype-matched  E^E^  dimer  was 
expressed  at  3-5  times  the  efficiency  of  the  isotype- 
mismatched  E^A^  dimer  based  on  the  amounts  of  each  )3  chain 
required  to  drive  cell  surface  expression  for  the  limited 
amount  of  E^.  When  A^  and  E^,  were  compared  their  coexpression 
with  relative  excess  A^,  the  efficiency  advantage  of  isotype- 
matched  (A<^)  versus  isotype-mismatched  (E^A^)  is  about  3  to 
4  fold.  Additional  experiments  employing  transf ectants 
expressing  Ab**,  Aa**/  Eb**,  and  Ea  showed  that  in  clones 
expressing  mRNA  ratios  similar  to  B  cells,  only  the  isotype- 
matched  dimers  were  expressed.  In  clones  that  expressed  high 
levels  of  A^*^,  in  addition  to  isotype-matched  A^'^A^'^  and  E^'^E^'^, 
there  was  a  significant  amount  of  EaA^**  at  the  cell  surface. 
These  data  indicate  that  the  asymmetry  chain  production  in 
individual  chain  levels  can  lead  to  the  expression  of  less 
favored  isotype-mismatched  dimers.  In  a  recent  report, 
recombinant  mouse  strains  and  transgenic  mice  with  defective 
Eb  genes,  but  with  normal  Ea  genes,  were  examined  for  surface 
expression  of  E  molecules  (Anderson  &  David  1989) . 
molecules  were  shown  to  be  expressed  in  B10.RFB2  (Ab*,  Aa*. 
Eb*,  Ea'')  and  B10.RQB3  (Ab",  Aa'',  Eb",  Ea*")  by  cell  surface 
staining  with  anti-E^  monoclonal  antibody  (14-4-4)  in  flow 
cytometry   analysis.       It   has   been   proposed   that   these  E^ 
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molecules  in  fact  may  be  hybrid  la  dimers  formed  by  E^tA^ 
pairing,  as  they  can  not  be  stained  by  E^-specific  antibodies 
and  can  be  detected  in  H-2''  mice  with  the  Ea^  transgene.  This 
finding  is  further  supported  by  the  demonstration  of  E^'^A^'*  as 
a  major  class  II  molecule  at  the  cell  surface  of  a  BALB/c  B 
cell  lymphoma  (Spencer  &  Kubo  1989) .  Furthermore,  although 
the  hybrid  E^A^  can  not  be  isolated  by  immunoprecipitation,  it 
can  function  in  vivo  leading  to  the  clonal  deletion  of  two  V£ 
TcR  subsets,  VB6  and  ViBll  (Anderson  &  David  1989)  ,  which  have 
been  shown  to  interact  with  the  I-E  molecule  during  the  thymic 
selection  (Kappler  et  al.   1987) . 

Functional  Role  of  Mhc  Gene 

One  of  the  most  distinguishing  features  of  gene  products 
of  Mhc  is  their  extensive  genetic  diversity.  One  of  the  most 
important  breakthroughs  in  cellular  immunology  was  the 
discovery  that  the  influence  of  gene  products  of  the  Mhc  on 
immune  response  stemmed  mainly  from  the  critical  role  they 
played  in  the  activation  of  regulatory  T  lymphocytes 
(Benacerraf  1981;  Heber-Katz  et  al.  1982,  1983).  Immune  T 
cells  are  clonally  specific  and  only  recognize  foreign 
antigens  in  the  context  of  appropriate  Mhc  molecules.  The 
discovery  of  this  Mhc-restriction  was  possible  only  because 
Mhc  molecules  are  polymorphic  and  T  cells  selected  by  an 
antigen   in   the   context   of   one  polymorphic  variant   can  be 
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activated  only  by  the  same  combination  of  foreign  and  Mhc 
molecules  (reviewed  by  Parham  1984) .  T  cells  must  corecognize 
antigen  in  association  with  one  of  these  Mhc-encoded  molecules 
in  order  for  activation  to  occur.  Cytotoxic  T  cells  prefer 
class  I  molecules  whereas  inducer  T  cells  prefer  class  II 
molecules.  However,  the  relationship  between  the  antigen- 
specific  and  Mhc-specific  recognition  component  of  T-cell 
receptor  remained  speculative  until  the  advent  of  T-cell 
cloning.  Kappler  et  al.  (1981)  fused  two  T-cell  clones  with 
different  specificities  and  asked  whether  the  antigen-  and 
Mhc-specific  component  could  segregate  independently.  A 
hybridoma  specific  for  ovalbumin  (OVA)  in  association  with  the 
I-A*"  molecules  was  fused  to  a  normal  T-cell  line  specific  for 
keyhole  limpet  hemocyanin  (KLH)  in  the  context  of  I-A^ 
molecules.  The  resulting  cloned  somatic  hybrid  could  be 
stimulated  to  secret  interleukin-2  by  either  original  pair  of 
antigen  and  la  molecule,  but  not  by  OVA  in  association  with 
I-A^  or  KLH  with  I -A*'.  These  results  indicated  that  T  cell 
recognition  of  antigen  was  dependent  on  recognition  of  the  la 
molecules.  The  first  convincing  evidence  that  indicated  that 
la  molecules  and  antigen  interact  with  each  other  during  the 
T-cell  activation  process  came  from  the  studies  of  BIO. A  mice 
immunized  with  pigeon  cytochrome  c  (Heber-Katz  et  al.  1982) . 
In  defining  the  specificity  of  the  response  by  using  different 
species  of  cytochrome  c,  it  was  noted  that  the  moth  cytochrome 
c  and  its  C-terminal  fragment  always  elicited  a  heteroclitic 
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response,  i.e.  it  was  more  potent  on  a  molar  basis  than  the 
immunogen,  pigeon  cytochrome  c.  Although  most  of  the  BIO. A 
(E/'E/)  T-cell  hybridomas  specific  for  pigeon  cytochrome  c 
could  be  stimulated  by  moth  cytochrome  c  in  association  with 
B10.A(5R)  hybrid  I-E  (Efi'iEa^)  antigen-presenting  cells,  they 
could  not  be  stimulated  by  pigeon  cytochrome  c  in  the  context 
of  hybrid  I-E.  No  other  antigen  presenting  cells  (APCs) 
carrying  disparate  H-2  haplotypes,  e.g.,  APCs  from  BIO  and 
B10.A(4R)  mice  (neither  strain  express  I-E  molecule) ,  gave 
any  stimulation.  Thus,  these  T-cell  clones  were  able  to 
recognize  moth  cytochrome  c  associated  with  either  E^'^iE^''  or 
Efi'iEj'  la  molecules.  Other  experimental  evidence  also 
suggested  that  antigen  recognition  by  cytotoxic  T  cells  was 
fundamentally  similar  to  that  of  helper  T  cells  (Hunig  &  Bevan 
1982) .  Using    la-containing    planar   membrane    as  antigen 

presenting  particles  together  with  defined  synthetic  peptides, 
it  was  demonstrated  that  la  and  "processed"  antigen  are  the 
only  requirement  for  T  cell  recognition.  That  la  and 
processed  antigen  interact  specifically  prior  to  T  cell 
recognition  was  supported  by  the  observation  that  antigens 
could  compete  with  one  another  at  the  level  of  antigen 
presentation  in  the  absence  of  T  cells  (reviewed  by  Buus  et 
al.  1987) .  The  first  direct  biochemical  evidence  of  a 
specific  antigen/Mhc  interaction  came  from  equilibrium 
dialysis  studies  using  affinity  purified  Mhc  molecules  and 
labeled    synthetic    peptide    (Babbitt    et    al.    1985) .  They 
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demonstrated  that  hen  egg  lysozyme  (HEL)  46-61  [HEL(46-61) ] 
bound  to  I-A*^.  but  not  to  I-A*^.  This  binding  study  correlated 
with  the  finding  that  T  cells  specific  for  HEL  (46-61)  from 
high  responder  H-2''  mice  are  restricted  by  I-A*"  ,  whereas  Hz. 
2^  mice  are  low  responders.  These  results  demonstrated  a 
correlation  between  immunogenic  peptide-Ia  interaction  and  Mhc 
restriction  (Babbitt  et  al.  1985) .  Furthermore,  it  was  shown 
that  the  failure  of  pigeon  cytochrome  c  to  be  recognized  in 
the  context  of  the  hybrid  I-E  molecule  was  due  to  the  fact 
that  hybrid  I-E  molecule  was  unable  to  interact  with  pigeon 
cytochrome  c-derived  synthetic  peptides  (Buus  et  al.  1987) . 
Each  Mhc  molecule  binds  many  different  peptides,  using  a 
single  binding  site  and  probably  through  the  recognition  of 
broadly  defined  motifs  (Buus  et  al.  1987) .  This  concept  of 
single  antigen  binding  site  is  compatible  with  the  recently 
described  X-ray  crystal lographic  structure  of  human  class  I 
molecules  (Bjorkman  et  al.   1987a,   1987b) . 

Genetic  Polymorphism  of  Mhc  Genes 

There  are  five  distinguishing  features  of  H-2  polymorphism 
in  wild  mice  that  have  been  the  subject  of  considerable 
investigation.  1)  there  is  a  large  number  of  alleles  encoded 
by  each  genetic  locus.  The  most  polymorphic  genetic  loci 
known  in  the  mouse  are  located  within  the  H-2  complex. 
Although  at  least  50  alleles  have  been  detected  for  the  H-2K 
and  for  the  H-2D  genes,     it  is  estimated  that  at  least  100 
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alleles  may  exist  in  each  of  these  genes  (Gotze  et  al.  1980; 
Klein  &  Figueroa  1981,  1986) .  There  are  other  genes  within 
the  H-2  complex  are  also  highly  polymorphic,  but  they  tend  to 
be  less  polymorphic  than  the  H-2K  and  H-2D  genes.  2)  most  if 
not  all  wild  mice  are  heterozygous  with  respect  to  H-2  class 
I  and  class  II  genes  (Duncan  et  al.  1979;  Nadeau  et  al.  1981) . 
This  high  level  of  heterozygosity  is  unprecedented  in  the 
mouse  and  is  mainly,  if  not  entirely,  a  result  of  the  presence 
of  a  large  number  of  alleles  in  wild  mouse  populations.  It 
was  estimated  that  over  90-95%  of  the  wild  mice  are 
heterozygous  at  both  K  and  D  loci  and  at  least  85%  are 
heterozygous  at  the  Ab  and  Eb  loci  (Duncan  et  al.  1979; 
Nadeau  et  al.  1981) .  These  figures  concur  with  the  high  H-2 
polymorphism  estimated  from  the  antigen  and  gene  freguencies 
(Klein  1986) .  3)  H-2  polymorphism  occurs  as  a  family  of 
closely  related  alleles.  Both  amino  acid  and  DNA  sequence 
analysis  demonstrates  that  the  similarity  between  H-2  genes 
and  proteins  is  discontinuous  (Wakeland  et  al.  1986) .  4)  both 
sequence  and  amino  acid  analysis  of  serologically  and 
biochemically  indistinguishable  class  II  molecules  derived 
from  different  subspecies  suggest  that  they  are  identical 
(Arden  et  al.  1980;  Arden  &  Klein  1982).  5)  there  is  a  high 
percentage  of  nucleotide  difference  between  alleles  from  the 
same  locus.  The  nucleotide  sequence  variation  can  go  up  as 
high  as  5-10%,  including  the  coding  region  (Benoist  et  al. 
1983;  Estess  et  al.  1986) 
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Mechanisms  generating  polymorphism  of  Mhc  genes 

Mutation.  It  is  generally  believed  that  ultimate  source 
of  genetic  variation  is  mutation  (Nei  1987b) .  There  is  no 
evidence  suggesting  that  the  extensive  diversity  of  Mhc  is 
generated  by  high  mutation  rate  (Hayashida  &  Miyata  1983; 
Klein  1987) .  Serologic  typing  of  class  II  genes  of  wild  mice 
in  global  populations  suggested  class  II  molecules  can  be 
arranged  into  families  of  alleles,  based  on  the  antigenic 
similarity  and  tryptic  peptide  fingerprints  of  I -A  molecules 
(Wakeland  &  Klein  1979;  Wakeland  &  Klein  1983).  Each  family 
consists  of  a  cluster  of  closely  related  alleles.  Tryptic 
peptide  fingerprinting  comparisons  of  alleles  within  the  same 
family  revealed  that  the  contemporary  Aa  and  Ab  alleles  arose 
from  common  ancestors  by  multiple  independent  mutational 
events  (Wakeland  &  Darby  1983) .  Furthermore,  radiochemical 
sequence  analysis  of  structural  variants  within  the  family 
indicates  that  these  I -A  variants  have  diversified  by 
accumulating  discreet  mutations  within  the  al  and  )S1  domains 
of  I -A  molecules  (Wakeland  et  al.  1985) .  Similar  conclusions 
have  been  drawn  from  the  studies  of  human  class  II  molecules 
(Gustafsson  et  al.  1984). 

Gene  conversion.         Gene     conversion  (hypermutational 

mechanism  or  segmental  exchange)  is  a  process  whereby  the  non- 
reciprocal  exchange  of  genetic  information  between  two  genes 
occurs   (Baltimore  1981) .      It  differs  from  unequal  crossing 
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over  in  that  neither  gene  gains  or  loses  genetic  material. 
Classically,  it  has  been  studied  in  allelic  genes  of  fungi  due 
to  the  ease  of  tetrad  analysis.    However,  a  growing  amount  of 
evidence  suggests  its  existence  in  mammalian  genomes  (reviewed 
by   Hansen   et   al.    1984) .      Analysis   of   the   murine   class  I 
mutants  has  provided  compelling  evidence  for  the  occurrence 
of  gene  conversion-like  events  in  Mhc  gene.  Nathenson  and  his 
coworkers  have  undertaken  the  painstaking  structural  analysis 
of  a  series  of  mutant  K*"  molecules    (Geliebter  et  al.  1987) 
Four  antigenically   important   regions  within  the   al   and  a2 
domains    of    K*'    molecules    are    revealed    from    the  analysis. 
Alterations  in  these  regions  result  in  the  formation  of  new 
epitopes  which  are  detectable  by  graft  rejection  in  vivo  and 
CTL  in  vitro.      The  result  of  their  analysis   suggests  that 
micro-recombinations  between  K*"  and  other  class  I  genes  may  be 
responsible  for  the  generation  of  diversity  of  class  I  gene. 
In  most,   if  not  all,  mutants  analyzed,  the  non-classical  H-2 
genes,  i.e.  Qa  and  Tla  region  gene  are  identified  to  be  donor 
genes  that  can  recombine  into  and  "mutate"  H-2  genes.  There 
is  evidence  showing  that  the  gene  conversion  is  operating  in 
H-2  class  II  genes  as  well  (Mengle-Gaw  et  al.  1984).    A  B6.C- 
H-2'""^^   (bm  12)   mouse  is  Mhc  class  II  Ab*"  mutant,   derived  by 
spontaneous  mutation  from  a  (BALB/c  x  B6)Fi  parent.     The  bml2 
mutant  and  its  B6  parent  show  reciprocal  skin  graft  and  two- 
way  mixed   lymphocyte   reaction    (MLR) .      Genetic   studies  and 
tryptic  peptide  mapping  studies  have  concluded  that  Ab*"^^  gene 


46 

from  bml2  mutant  differ  only  3  nucleotide  from  its  B6  parent 
Ab''  gene.  By  T  cell  proliferation  assay  and  monoclonal 
antibody-blocking  studies,  alloreactive  T  cell  clones  are 
shown  to  recognize  the  E^'^e/  and  hj'hfi'"'^^.  Comparison  of 
sequences  among  Ab*""^^,  Ab*"  and  Eb''  indicates  that  the  bml2  DNA 
sequence  is  identical  to  the  Eb''  sequence  in  the  region  where 
it  differs  from  P^i' .  Furthermore,  this  region  is  flanked  by 
a  stretch  of  identical  DNA  sequence  between  Ab*"  and  Eb*".  These 
results  suggest  that  the  bml2  mutation  arose  by  gene 
conversion  of  this  region  of  Eb''  into  the  corresponding  region 
of  Ab*".  The  maximum  extent  of  sequence  transfer  between  Eb'' 
and  Ab*"  is  estimated  to  be  44  nucleotides,  but  could  be  as 
little  as  14  nucleotides.  Evidence  of  segmental  exchange  has 
also  been  provided  by  analyzing  the  exon  sequences  of  eight 
Ab  alleles  (McConnell  et  al.  1988)  .  In  an  attempt  to  analyze 
the  association  between  exon  and  intron  sequences,  it  was 
noted  that  most  alleles  of  exons  evolve  in  association  with 
their  associated  intron  sequence  polymorphisms  with  the 
exception  of  two  alleles,  Ab''  and  Ab""''  (Figure  2-7)  .  These 
two  alleles  appear  to  be  the  products  of  intragenic  segmental 
exchange  (McConnell  et  al.   1988) . 


O 


u 

wo  3 

<  (0  0) 


a) 

0) 


as 


0)  .c 

c 

•H 


(0 

en 

(0 
H 
0 

o 

£! 

o 
c 

(0 
X! 
0 

Q) 


•H 

U 
O 


ft  5 
o  2 


H  ^3 

0) 
O 

c 

0) 


(0  <(-i 

■H  o 


c 

(U 


0) 


o 

0) 

o 
c 

& 

Q) 
10 

c 
o 

>! 
0) 

0) 

> 

e  Q) 

o 

a 
o 

-p 

0) 


-p 

H  -C 


0) 
•H  X! 


(0 

-p 
c 

0) 

Q> 
(0 

o 
c 

0) 
CP 
(0 

■p 
c 

-H 

>i 
x> 

T3 
0) 

o 

-o 
o 

a 

c 

0) 
0) 
X5 

0) 

> 

(0 

o 
■p 

u 

0) 

ft 
ft 

(0 


48 


LU 


O 

QCL  X 
LU 


CO 

§  o 

C  CD 
—  CO 

c 


O  03  CD 
CD    CD  5 

2  0)0 
CD  X 
CO  CD 


CD 

SI 


o- 
t5 


CO 


CD 

O) 

CC 

CD 

C 


C\J 


CO 


49 

Trans-specific  evolution.     The  evolutionary  rate  of  Mhc 
loci  is  not  higher  than  that  of  any  other  loci   (Hayashida  & 
Miyata   1983) .      Although  the  presumed  rapid  diversification 
within  species  can  be  explained  by  mechanisms  such  as  gene 
conversion,    an  alternative  hypothesis  has  been  proposed  by 
Klein  et  al.   (1980,  1987).     According  to  this  hypothesis,  the 
evolution  of  Mhc  polymorphism  is  via  a  trans-species  mode, 
starting  with  a  number  of  major  alleles  that  are  passed  on  in 
phylogeny     from     one     species     to     another.         During  the 
evolutionary  process  the  alleles   accumulate  the  mutations, 
which  result  in  the  extensive  diversity  of  Mhc  genes.  There 
is  mounting  evidence  supporting  this  hypothesis.  McConnell 
et  al .    (1988)    assembled  a  collection  of  49  H-2  haplotypes 
derived  from  five  Mus  species,  including   Mus  m.  musculus.  Mus 
m.  domesticus.  Mus  m.  castaneus,  Mus  spicileaus.  Mus  spretus. 
A  total  of  31  Ab  alleles  was  defined  by  RFLP  analysis.  Based 
on  the  degree  of  sequence  divergence,     31  alleles  defined  by 
restriction  fragment  length  polymorphism  (RFLP)  can  be  divided 
into  three  distinct   evolutionary   lineages.      Most   of  these 
alleles  (28  out  of  31)  were  in  either  lineage  1  or  2 ,  both  of 
which  consisted  of  alleles  derived  from  4  separate  Mus  species 
(Table  2-1  and  Figure  2-8) .      These   findings  are  consistent 
with  the  trans-species  evolution  of  Ab  gene  and  contrast  with 
data  obtained  when  other  nuclear  genes  or  mitochondrial  DNA 
(mtDNA)    polymorphisms  were   analyzed   in  mice   from  the  same 
populations.    Genomic  sequence  comparisons  of  Ab"^  and  Ab''  show 
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that  the  region  of  highest  divergence  between  these  alleles 
occurs  in  the  intron  separating  the  )31  and  )82  exons  (Figure 
2-9)  .  Ab**  contains  an  additional  861  bp  of  inserted 
sequences,  which  are  composed  of  SINE  (short  interspersed 
repetitive  elements) ,  commonly  named  retroposon.  The 
relationship  of  this  retroposon  polymorphism  to  the 
evolutionary  lineage  defined  was  tested  by  genomic  restriction 
mapping  of  Ab  genes  from  both  lineages,  1  and  2.  The  results 
indicated  the  861  bp  retroposon  insertion  is  characteristic 
of  lineage  2  alleles.  Using  the  SINE  sequence  as  an 
evolutionary  tag,  it  is  estimated  that  the  Ab  alleles  in  these 
two  lineages  diverged  at  least  0.4  million  years  ago  and  have 
survived  the  speciation  events  leading  to  several  Mus  musculus 
subspecies . 

Their  studies  are  further  supported  by  the  works  of 
Figueroa  et  al.  (1988)  .  They  showed  that  the  molecules 
encoded  by  alleles  of  Ab  locus  fall  into  two  groups  defined 
by  their  reactions  with  monoclonal  antibodies.  One  group 
reacts  with  antibodies  specific  for  the  antigenic  determinant 
H-2A.m25 ;  the  other  with  antibodies  specific  for  determinant 
H-2A.m27 .  This  serological  reactivity  pattern  correlates  with 
a  specific  structural  feature  of  the  proteins  of  Ab  genes. 
Sequence  comparison  of  Ab  genes  derived  from  inbred  and  wild 
strains  has  revealed  that  m27-positive  proteins  have  two  amino 
acids  deleted  at  positions  65  and  67  in  the  131  exon,  while  m25 
antibodies  react  with  Ab  chains  that  do  not  have  deletions. 
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But  no  Ab  molecules  were  ever  detected  to  be  positive 
ornegative  for  both  antibodies  simultaneously.  The  perfect 
correlation  between  the  serological  pattern  and  the  presence 
or  absence  of  the  two  deletions  have  been  confirmed  by  testing 
a  panel  of  Ab  in  Northern  blot  analysis  (Figueroa  et  al. 
1988) .  The  same  deletion  polymorphisms  also  exist  in  other 
species  distantly  related  to  M.  musculus  complex  such  as  M, 
carol i  and  M.  pahari.  which  is  estimated  to  be  separated  from 
M.  musculus  complex  1.7  and  4.8  million  years  ago, 
respectively.  Furthermore,  the  non-deleted  and  deleted  forms 
of  Ab  genes  are  also  shown  to  be  present  in  inbred  strains  of 
rat,  which  is  another  rodent  genus  closely  related  to  the 
genus  Mus.  They  conclude  that  the  codon  deletion  polymorphisms 
are  shared  not  only  by  different  species  of  the  same  genus  but 
also  by  different  genera  of  the  same  order. 

Comparisons  of  class  I  Mhc  alleles  in  two  closely 
relatedly  species:  humans  (Homo  sapiens)  and  chimpanzees  (Pan 
troglodytes)  have  also  indicated  the  trans-species  mode  of 
evolution  in  this  family  of  genes  (Lawlor  et  al.  1988;  Mayer 
et  al.  1988) .  There  are  no  features  that  distinguish  human 
alleles  from  chimpanzees.  Individual  HLA-A  or  B  alleles  are 
more  closely  related  to  individual  chimpanzee  alleles  than  to 
other  HLA-A  or  B  alleles.  These  studies  support  the  notion 
that  a  considerable  proportion  of  contemporary  HLA-A  and  B 
polymorphisms  existed  before  divergence  of  the  chimpanzee  and 
human  lines.     A  recent  report  indicates  that  as  high  as  30% 
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of  asian  wild  mice  (e.g.  Mus  m.  musculus.  Mus  m.  domesticus. 
Mus  m.  castaneus)  carry  a  H-2K^  antigen  detected  by  an 
alloantiserum  specific  for  H-2  class  I  gene  (Sagai  et  al. 
1989) .  H-2K^  antigen  is  further  characterized  by  a  panel  of 
monoclonal  antibodies  and  restriction  enzyme  analysis  with 
a  H-2K  locus-specific  probe  for  3'  end  of  H-2K.  A 
characteristic  RFLP  pattern  was  always  found  to  be  associated 
with  a  monoclonal  antibody  reactivity  pattern.  The 
concordance  between  the  presence  of  antigenic  determinant  and 
a  particular  RFLP  pattern  is  observed  not  only  in  Mus  musculus 
subspecies,  but  also  in  M.  spretus.  Their  results  indicated 
that  the  antigenic  determinant  reactive  with  monoclonal 
antibodies  is  an  ancient  polymorphic  structure  which  has 
survived  speciation  in  the  Mus  genus,  and  is  closely 
associated  with  a  stable  DNA  segment  at  the  3 '  end  of  H-2K 
gene. 

Intra-exonic  recombination.  A  recent  study  of  Mhc  class 
II  Ab  genes  indicated  that  another  mechanism  was  mainly 
responsible  for  the  genetic  diversity  of  Mhc  genes  (She  et  al. 
1990b) .  A  panel  of  52  different  alleles  derived  from 
laboratory  inbred  mice  as  well  as  various  species  of  mice  and 
rats  was  analyzed  for  their  A^2  nucleotide  sequence. 
Examination  of  the  patterns  of  sequence  polymorphisms  revealed 
that  the  majority  of  sequence  diversity  was  localized  in  five 
subdomains.  Each    of     these     subdomains     have  several 
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polymorphic  sequence  motifs.  On  the  basis  of  the  hypothetical 
three-dimensional  structural  model  of  class  II  molecules 
(Brown  et  al.  1988) ,  these  polymorphic  sequence  motifs  are 
located  in  the  regions  encoding  the  ABS.  With  respect  to  the 
whole  Afi2  exon,  it  was  found  that  a  specific  sequence  motif 
could  associate  with  several  different  motifs  from  other 
subdomains  to  form  an  allele.  This  observation  indicated  that 
the  diversification  of  A^2  exons  resulted  from  intraexonic 
recombinations  which  shuffled  these  motifs  into  various 
combinations  (Wakeland  et  al.   1990a;  She  et  al.  1990b) 

Mechanisms  that  maintain  Mhc  polymorphisms 

Although  a  variety  of  data  indicate  that  Mhc  polymorphism 
is  maintained  by  some  type  of  balancing  selection,  the  precise 
mechanisms  involved  have  remained  controversial.  Two  forms 
of  balancing  selections,  overdominance  and  frequency-dependent 
selection,  have  been  proposed  to  account  for  the  unprecedented 
genetic  diversity  of  Mhc  genes. 

Overdominant  selection (heterozygous  advantage) .  The 
maintenance  of  Mhc  polymorphism  by  overdominant  selection  was 
first  proposed  by  Doherty  and  Zinkernagel  (Doherty  & 
Zinkernagel  1975).  It  is  based  on  the  well-established 
experimental  observation  that  Mhc-1 inked  responsiveness  is  a 
dominant  (or  codominant)  genetic  trait  (Benaceraf  &  Germain 
1978) .     Mhc  heterozygotes  are  capable  of  responding  to  any 
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antigens  recognized  by  either  parental  Mhc  haplotypes,  since 
Mho  molecules  encoded  by  both  Mhc  haplotypes  are  coexpressed 
on   the    surfaces    of    antigen-presenting    cells    (Benaceraf  & 
Germain  1978).     Hughes  &  Nei   (1988)   examined  the  pattern  of 
nucleotide  substitution  in  the  region  of  ABS,    involving  the 
57  polymorphic  amino  acid  residues  and  other  regions  of  Mhc 
class  I  alleles  of  both  human  and  mice.     Their  study  is  based 
on    the    theoretical    prediction    that    in    the    presence  of 
overdominant    selection   the    rate    of    codon    substitution  is 
increased  compared  with  that   for  neutral   alleles   and  only 
nonsynonymous  substitution  would  be  subject  to  overdominant 
selection  as  synonymous  substitutions  are  more  or  less  neutral 
(Maruyama    &    Nei    1981)  .       This    increase    in    rate    of  codon 
substitution      is     due     to     the     selective     advantage  of 
heterozygotes  carrying  the  new  mutant  allele.     Their  results 
indicate     that     in     the     ABS     the     rate     of  nonsynonymous 
substitution  is  higher  than  that  of  synonymous  substitution, 
whereas  in  other  region  the  reverse  is  true.    In  a  later  study 
(Hughes  &  Nei  1989),  the  same  type  of  analysis  is  extended  to 
class  II  Mhc  genes.     It  is  concluded  that  the  unusually  high 
degree  of  polymorphism  at  class  II  Mhc  loci  is  caused  mainly 
by  overdominant  selection  operating  in  the  ABS.  Therefore, 
the  biological  basis  of  overdominant  selection  for  class  II 
Mhc  loci  seems  to  be  similar  to  that  for  class  I  Mhc  loci. 
A  mathematical    study   of   overdominant   selection  model  also 
indicates  that  it  can  maintain  polymorphic  allelic  lineages 
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for  a  long  time  and  thus  it  has  sufficient  explanation  for  the 
trans-species  evolution  of  Mhc  gene  (Takahata  &  Nei  1990) . 

Frequency-dependent     selection.  Initially     it  was 

speculated  that  Mhc  alleles  generate  heterozygote  disadvantage 
in  association  with  infectious  diseases  and  that  some  kind  of 
frequency-dependent  selection  is  required  to  maintain  the  high 
degree  of  polymorphism  (Bodmer  1972) .  Pathogen  adaptation 
model  was  suggested  as  one  form  of  frequency-dependent 
selection  (Snell  1968;  Bodmer  1972).  This  model  is  based  on 
the  assumption  that  host  individuals  carrying  new  antigens, 
which  have  arisen  recently  by  mutation,  will  be  at  an 
advantage  because  pathogens  will  not  have  had  the  time  to 
adapt  to  infecting  the  cells  with  new  antigens.  Therefore, 
this  will  generate  a  new  form  of  frequency-dependent 
selection,  in  which  a  new  Mhc  allele  initially  has  a  selective 
advantage  compared  with  an  old  allele,  but  gradually  declines 
with  time.  This  model  also  suggests  that  in  the  presence  of 
pathogen  adaptation  the  average  heterozygosity,  the  number  of 
alleles,  and  the  rate  of  codon  substitution  will  increase 
compared  with  those  for  neutral  alleles. 

Rare  allele  advantage.  Another  model  of  frequency 
dependent  selection  is  rare  allele  advantage.  This  hypothesis 
is  based  on  the  notion  that  endemic  pathogens,  which  evolve 
much  more  rapidly  than  their  vertebrate  hosts,  will  tend  to 
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adapt  their  antigenicity  to  minimize  immune  recognition  by  the 
most  prevalent  Mhc  genotypes  in  a  population.  Consequently, 
new  or  rare  Mhc  alleles  will  have  a  selective  advantage  due 
to  increased  resistance  to  prevalent  pathogens.  This  model 
predicts  cyclic  fluctuations  in  the  frequencies  of  Mhc  alleles 
as  pathogens  are  driven  to  evolve  antigenicity,  evading  the 
immune  responsiveness  of  a  series  of  new  "prevalent"  alleles. 
This  model  can  explain  the  maintenance  and  long  persistence 
of  polymorphic  alleles  by  rescuing  the  rare  alleles  from 
distinction  (Wakeland  et  al.   1990) . 

Recombination  Within  the  Mhc 

Recombinational  hot  spot  within  I  region 

The  genetic  material  is  a  dynamic  structure  that 
reorganizes  during  evolution  and  differentiation.  Nucleotide 
sequences  are  rearranged  by  recombination  between  homologous 
or     non-homologous     sequence.  While     homologous  equal 

recombination  breaks  and  rejoins  chromosomes  at  precisely  the 
same  position,  unequal  recombination  between  homologous 
sequences  in  different  positions  leads  to  duplication  and 
deletions.  Over  the  last  ten  years  recombinant  mouse  strains 
have  been  analyzed  by  RFLP  analysis  and  DNA  sequencing  to  map 
the  crossover  in  the  I  region  (Steinmetz  et  al.  1982a) .  These 
studies  have  shown  that  recombination  within  the  I  region  is 
not  random,  but  localized    to  specific  sites.       These  sites 
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have  been  termed  recombination  hot  spots  (RHS)  (Steinmetz  et 
al .  1982a)  .  A  first  such  RHS,  localized  with  the  intron 
between  the  second  and  third  exons  of  jEb  gene,  was  identified 
from  analysis  of  six  intra-I  region  recombinant  mouse  strains 
(Kobori  et  al.  1984) .  Since  then,  additional  three  RHS's  have 
been  identified  within  the  Mhc,  including  K/Pb,  Pb/Ob 
(Steinmetz  et  al.  1986;  Uematsu  et  al.  1986)  and  Ea  (Lafuse 
&  David,  1986)  (Figure  2-10) .  RFLP  analysis  indicates  that 
recombination  within  the  Pb/Ob.  Eb  and  Ea  is  reciprocal 
(Steinmetz  et  al.  1982a;  Steinmetz  et  al.  1987;  Lafuse  &  David 
1987)  .  Analysis  of  secondary  recombinant  strains  indicates 
that  chromosomes  that  have  recently  undergone  a  recombinant 
event  are  unstable  and  quite  likely  to  undergo  a  second 
recombination  in  the  next  generation  (Lafuse  &  David  1987) . 

Molecular  basis  of  recombinational  hotspots 

In  the  human  genome,  recombinational  hotspots  mainly 
occur  in  regions  containing  hypervariable  minisatellite 
sequences.  These  minisatellite  sequences  are  composed  of 
tandem  repeats  and  occur  at  multiple  locations.  The  repeat 
unit  contains  a  common  16-bp  core  sequence,  GGAGGTGGGCAGGARG . 
DNA  sequence  searchs  for  the  Pb/Ob  and  Eb  recombinational 
hotspots  have  found  that  short  repeated  sequences  with  some 
homology  to  the  recombination  signal  Chi  (GCTGGTGG)  of  phage 
lambda:  (CAGA)6  in  the  Pb/Ob  hotspot  and  (CAGG)7.9  in  the  Eb 
hotspot  (Steinmetz  et  al.  1986).     The  CAGG  repeated  sequence 
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identified  in  the  Eb  hotspot  exhibiting  significant  homology 
to  the  human  minisatellite  core  sequence,  and  thus  may 
represent  a  murine  minisatellite  (Steinmetz  et  al.  1987) . 
Recently,  a  female-specific  recombination  hotspot  has  been 
mapped  to  a  1  kb  region  of  DNA  between  the  Pb  and  Ob  genes 
(Shiroishi  et  al.  1990)  .  This  hotspot  predominantly  occurs 
in  crosses  between  Japanese  wild  mice  Mus  musculus  molossinus 
and  laboratory  haplotypes.  Its  location  overlaps  with  a  sex- 
independent  hotspot  previously  identified  in  the  Mus  musculus 
castaneus  CAS3  haplotype.  Sequence  comparisons  between  DNA 
surrounding  this  hotspot  and  corresponding  regions  from  other 
strains,  including  BIO. A,  C57BL/10,  CAS3  and  C57BL/6,  revealed 
no  significant  difference.  However,  sequence  analysis  of  this 
Pb/Ob  hotspot  with  a  hotspot  in  Eb  indicated  that  both  have 
a  very  similar  molecular  structure.  Each  hotspot  is  composed 
of  two  elements,  mouse  middle  repetitive  MT  family  and  the 
tetrameric  repeated  sequence,  both  are  separated  by  1  kb  of 
DNA  (Shiroishi  et  al.   1990) . 

Definition  of  Evolutionary  Lineage 

The  evolutionary  lineage  of  Ab  was  initially  defined  by 
RFLP  analysis  of  31  Ab  alleles  from  5  different  Mus  species 
(McConnell  et  al.  1988) .  These  31  alleles  were  ordered  into 
three  distinct  lineages  based  on  calculating  the  fraction  of 
restriction  fragments  (F)  (Nei  &  Li  1979)  and  sites  shared 
(S) ,  which  is  used  to  estimate  the  genomic  sequence  divergence 
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(Table  2-1)  .  Sequence  comparisons  of  lineage  1  (Ab*^)  and 
lineage  2  (Ab'')  alleles  indicated  that  the  major  DNA  sequence 
polymorphism  between  these  two  lineages  occur  in  the  intron 
2  between  /31  and  /32  exons  (Figure  2-9)  .  The  sequence  homology 
in  this  intron  is  <90%,  and  Ab''  gene  contains  an  extra  861  bp 
of  retroposon,  flanked  by  13  bp  direct  repeats 
(ATGTATGCTGTTT) .  The  host-derived  nature  of  this  direct 
repeat  sequence  indicates  that  the  861  bp  retroposon  was 
inserted  into  this  position  as  a  random  event  during  the 
evolutionary  divergence  Ab  genes.  Inspection  of  genomic 
restriction  maps  of  alleles  derived  from  separate  Mus  species 
indicate  that  the  retroposon  insertion  is  characteristic  of 
lineage  2  alleles  (McConnell  et  al.  1988)  .  These  results 
indicate  the  evolutionary  lineages  defined  by  RFLP  analysis 
reflect  alleles  with  different  retroposon  polymorphisms. 


Structure  and  Evolution  of  Retroposon 


Before  cloning  of  DNA  became  a  major  tool  of  studying 
gene  structure  and  function,  chromosome  renaturation 
experiments  showed  that  most  organisms  possess  short  stretches 
of  moderately  repeated  DNA  (mrDNA)  separated  by  longer 
sequences  of  low  copy  number  (Davidson  and  Britten  1979)  .  For 
mammals,  most  of  the  mrDNA  is  composed  of  retroposons,  some 
of  which  are  thought  to  represent  mobile  genetic  elements 
using  RNA  intermediates  in  their  replication   ( Jagadeeswaran 
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et  al.  1981) .  These  mrDNA  belong  to  different  sequence 
families  in  different  mammalian  orders (reviewed  by  Rogers 
1985) .  The  majority  of  mammalian  interspersed  repeated  DNA 
falls  into  two  families,  referred  to  as  short  and  long 
interspersed  nucleotide  elements,  SINEs  and  LINEs, 
respectively  (Singer  1982) .  The  "generic"  SINE  sequence 
contains  an  internal  RNA  polymerase  III  promoter,  an  A-rich 
3 'end  and  flanking  direct  repeats.  The  size  of  SINEs 
typically  range  from  75  to  as  much  as  500  bp  in  length.  All 
nonviral  retroposons  correspond  to  a  partial  or  complete  DNA 
copy  of  a  cellular  RNA  species.  With  a  few  exceptions, 
nonviral  retroposons  are  derived  from  fully  processed  RNAs 
(reviewed  by  Weiner  et  al.   1986) . 

Structure  of  Alu  and  "Alu-like"  Family 

The  first  well-characterized  and  the  most  abundant 
repeated  DNA  family  in  primates  is  the  Alu  family  which 
constitute  most  of  the  dispersed,  repeated  DNA  (Houck  et  al. 
1979).  The  500,000  Alu  elements  in  the  human  constitute  5-6% 
of  the  genome  by  size,  occurring  on  average  every  5-9  kb  and 
differing  on  average  by  13%  from  the  consensus  sequences 
(Schmid  &  Jelinek  1982;  Rinehart  et  al.  1981).  Other  SINE 
families  are  referred  to  as  "Alu-like"  or  "Alu-equivalent" 
families.  Mice,  rats,  and  hamsters  all  contain  two  abundant 
"Alu-like"  families,  Bl  and  B2  (Kramerove  et  al.  1979;  Krayev 
et    al.     1980;    Haynes    et    al.     1981).       The    Alu  elements. 
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approximately  300  bp  long,  were  so  named  because  they  contain 
a  distinctive  Alu  I  cleavage  site.  Regions  of  direct  internal 
repetition  within  Alu  sequences  indicate  that  the  Alu  element 
is  composed  of  two  incompletely  homologous  arms,  an 
approximately  130  bp  left  arm  and  a  right  arm  which  differs 
from  the  other  by  an  insertion  of  31  bp  (reviewed  by  Doolittle 
1985) .  Although    human    Alu    sequences    are    dimeric,  the 

homologous  rodent  sequences  (the  Bl  superfamily)  are 
monomer ic.  It  is  believed  that  both  Alu  and  Bl  sequences  are 
derived  independently  from  7SL  RNA  as  7SL  RNA  gene  has  about 
150  bp  in  the  middle  that  is  not  found  in  the  Alu  family  (Ullu 
et  al.  1985;  Weiner  et  al.  1986).  7SL  RNA  is  a  component  of 
signal  recognition  particle,  required  for  cotranslational 
secretion  of  proteins  into  the  lumen  of  rough  endoplasmic 
reticulum  (Walter  &  Blobel  1982),  and  is  highly  conserved 
throughout  evolution.  Alu-like  sequences,  and  retroposons  in 
general,  have  a  strong  tendency  to  insert  into  each  others' 
(A) -rich  tails.  This  has  apparently  generated  composites 
which  are  themselves  propagated  as  single  retroposons 
(Jagadeeswaran  et  al.  1981;  Haynes  et  al.  1981). 

Mechanisms  of  Retroposition 
Transcription  by  polymerase  III 

The  basic  model  for  retroposition  of  SINEs  involves  RNA 
polymerase  III  transcription  of  genes,  reverse  transcription 
of  the  RNA,   and  integration  into  the  genome   (Figure  2-11)  . 
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All  SINES  contain  an  internal  RNA  polymerase  III  split 
promotor  (Galli  et  al.  1981) .  In  vitro  transcription 
experiments  have  shown  that  the  5 '  end  of  the  SINE  transcripts 
have  coincided  exactly  with  the  left  end  of  the  repeated  DNA 
sequence.  These  results  have  led  to  the  proposal  that  the 
SINEs  propagate  via  RNA-mediated  retroposition  ( Jagadeeswaran 
et  al.  1982)  .  SINE  family  members  are  able  to  produce  in  vivo 
transcripts,  their  transcription  is  regulated  in  a  tissue- 
specific  manner.  The  homogeneous  size  of  Alu  transcripts 
indicates  that  one  or  a  few  identical  family  members  are 
transcribed  (Watson  &  Sutfliffe  1987) .  The  transcription  of 
7SL  RNA  gene  requires  a  specific  3 7 -bp  upstream  sequence  in 
addition  to  its  internal  promoter  (Ullu  &  Weiner  1985) .  Since 
the  Alu  family  has  evolved  from  7SL  RNA,  its  promotor  may 
similarly  depend  on  such  upstream  sequences.  A  critical  step 
in  promoting  an  efficient  SINE  retroposition  may  be  mutations 
that  render  the  promotor  independent  of  flanking  sequence. 
However,  the  established  chromatin  structure  and  environment 
into  which  the  SINE  member  is  situated  may  have  a  regulatory 
effect  on  the  transcription  of  SINE  family  members.  In 
transfection  assays,  it  was  found  that  the  introduced  SINE 
member  is  transcriptionally  active  in  transient  assay,  but  is 
silent  in  long-term  transformants .  These  results  also  support 
the  concept  that  the  internal  promotor  is  not  sufficient  by 
itself  in  vivo   (reviewed  by  Deninger  1990) . 


Figure  2-11.  A  proposed  mechanism  for  SINE  retroposition. 
The  first  step  is  transcription  of  the  repeated  DNA  sequence. 
The  repeat  is  represented  by  a  heavy  line,  its  flanking 
sequence  by  thinner  lines,  an  the  transcript  by  a  wavy  line. 
Transcription  initiates  at  the  beginning  of  the  repeat, 
adjacent  to  the  flanking  direct  repeat  (double  solid  arrows) , 
continues  through  the  entire  repeat,  and  terminates  in 
flanking  sequence.  This  transcript  is  suggested  to  be  capable 
of  self-priming  reverse  transcription  by  priming  with  its 
terminal  U  residues  on  the  3 '  A-rich  region  of  the  repeat 
transcript.  Removal  of  the  RNA  will  then  leave  a  single- 
stranded  cDNA  copy  of  the  entire  repeat  with  no  falanking 
sequences.  This  cDNA  must  tehn  integrate  into  a  genomic  site 
with  staggered  nicks.  It  is  hypothesized  that  an  A  richness 
at  the  nikc  site  may  interact  with  the  T-rich  cDNA  end  to 
stabilized  the  interaction.  Repair  synthiesis  at  the  junctions 
will  then  result  in  formation  of  a  newly  integrated  repeated 
DNA  family  member  with  a  different  flanking  direct 
repeat (double  hollow  adrrows) .  Many  of  these  steps  are 
hypothetical  and  a  number  of  alternatives  are  possible. 
Adapted  from  Deininger  (1989) . 
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Termination  of  transcription 

Most  SINEs  do  not  contain  the  termination  signal  for  RNA 
polymerase  III  (Fuhrman  et  al.  1981) .  Transcription  starts 
from  the  5'  end  of  SINE,  runs  through  the  entire  repeat,  and 
terminates  at  the  flanking  sequence  by  chance  as  the  consensus 
sequence  for  termination  contains  four  or  more  T's  in  a  row 
(Bogenhagen  et  al.  1980) .  Most  in  vivo  SINE  transcripts  appear 
to  be  polyadenylated  (Deininger  1990) . 

Reverse  transcription 

Since  the  transcripts  of  SINE  family  members  normally 
possess  a  poly (A)  tract,  they  may  be  able  to  self-prime  their 
reverse  transcription  ( Jagadeeswaran  et  al.  1981) .  Moreover, 
the  RNA  polymerase  III  transcripts  should  have  three  or  more 
U's  at  their  3'  end,  which  may  fold  back  and  prime  reverse 
transcription  (Bogenhagen  et  al.  1980) .  Reverse  transcription 
could  also  be  primed  by  an  intermolecular  interaction,  for 
instance,  using  the  3 'end  of  another  transcript  through  the 
(A) -rich  region  (VanArsdell  et  al.  1981).  The  source  of 
reverse  transcriptase,  which  must  be  active  in  germ  line,  is 
unknown.  One  possible  source  is  from  the  intracisternal  A 
particles  (lAP) ,  which  produce  particles  containing  reverse 
transcriptase  (Wilson  &  Kuff  1972)  and  are  active  in  early 
embryos  (Kelly  and  Condamine  1982) .  Or  it  may  be  provided 
during  retroviral  infections  or  from  endogenous  retroviral 
sequences   (Martin  et  al.   1981) .     Small  RNA  molecules  can  be 
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packaged  into  retroviral  particles  and  be  reverse  transcribed 
(Linial  et  al.  1978)  .  Packaging  should  facilitate  the  reverse 
transcription  and  may  account  for  the  high  efficiency  of  SINE 
retropositon.  Packaging  may  also  promote  an  "infection-like" 
process  facilitating  RNA  made  in  somatic  cell  to  enter  the 
germ  line  (Vanin  1984) . 

Integration 

To  facilitate  the  integration  process,  the  genome  must 
be  nicked  to  allow  the  entry  of  new  sequences,  followed  by 
repair  synthesis  to  make  direct  repeats  at  the  integration 
sites.  Direct  repeats  generated  are  generally  rich  in  A 
residues  and  vary  widely  in  length,  suggesting  that  SINE  do 
not  use  specific  integration  enzymes  but  instead  take 
advantage  of  nicks  generated  by  other  nonspecific  enzymes. 
Topoisomerases,  enzymes  that  relax  the  genome  during 
replication  and  transcription,  have  been  shown  to  have  nicking 
activity  in  a  SINE  family  member  in  vitro  (Perez-Stable  et  al. 
1984) .  Although  topoisomerase  I  is  generally  thought  to  be 
nonspecific  in  its  nicking  activity,  hot  spots  for  DNA 
cleavage  have  been  reported  (Busk  et  al.  1987) .  These  sites 
are  A  rich  and  at  least  partially  resemble  the  3'  terminus  and 
direct  repeats  of  SINEs.  Not  only  are  the  integration  sites 
of  SINEs  A  rich,  but  the  A  richness  is  predominantly  at  the 
left  end  of  the  direct  repeat  (Daniels  &  Deininger  1985; 
Rogers     et     al.     1986)      .         These     findings     have  several 
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ramifications.  First,  it  shows  that  the  integration  is  not 
random.  Second,  since  the  3'  ends  of  the  SINE  families  are 
generally  A  rich,  when  they  integrate  into  a  new  site  they 
generally  make  that  site  even  more  A  rich.  Therefore,  the  3' 
end  of  SINEs  are  improved  integration  sites  for  more  SINE 
copies,  resulting  in  a  tendency  to  form  perfect  tandem  dimers 
(reviewed  by  Rogers  1985) .  In  several  examples,  it  appears 
that  the  integration  of  one  element  abutting  another  form  a 
composite  so  that  they  could  retropose  as  a  single  unit 
(Daniels  &  Deininger  1983) . 

Functions  Attributable  to  SINE 

It  is  assumed  that  the  broad  genomic  distribution  and 
high  copy  number  may  serve  an  important  cellular  function. 
It  has  also  been  argued  that  these  repetitive  elements  are 
selfish  DNA  whose  self -propagation  provides  no  benefit  to 
their  hosts  (Doolittle  and  Sapienza  1980;  Orgel  and  Crick 
1980) .  SINEs  have  been  involved  in  a  number  of  effects  on 
genome  structure  and  evolution.  For  example,  SINEs  may 
promote  deletion  or  facilitate  recombination  (Lehrman  et  al 
1987),  act  as  limits  to  gene  conversion  (Hess  et  al.  1983)  and 
move  unrelated  DNA  segments  throughout  the  genome  either  via 
retroposition  of  sequences  adjacent  to  SINEs  (Zelnick  et  al. 
1987) .  They  may  just  affect  the  long-terms  adaptability  of 
the  species. 
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Recombination 

Recombination  involving  the  Alu  repeats  have  resulted  in 
phenotypic  changes.  For  example,  at  least  two  different  forms 
of  globin  gene  defects  occur  in  a  pair  of  inverted  Alu 
repeats,  which  result  in  a  deletion  of  gene.  The  LDL  receptor 
gene  has  a  number  of  Alu  dispersed  repeats  in  its  intron,  3' 
noncoding  region,  and  flanking  region.  Five  naturally 
occurring  insertion/deletion  mutants  of  this  gene  have 
produced  defective  receptors,  four  of  which  involve  Alu-Alu 
recombination  (Horsthemke  et  al.   1987) . 

Suppression  of  gene  conversion 

Examination  of  regions  of  globin  genes  have  provided 
evidence  that  SINE  can  help  to  limit  gene  conversion  events 
(Hess  et  al.  1983;  Schimenti  &  Duncan  1984)  .  The  globin  genes 
consist  of  a  multigene  family  whose  members  start  to  evolve 
after  duplication.  By  limiting  the  degree  of  gene  conversion, 
the  SINE  sequences  may  promote  gene  diversification  and  the 
evolution  of  new  functions (Deininger  1990). 

Mobilization  of  DNA  sequence 

Several  composite  SINE  families  are  formed  by  fusing  new 
sequences  with  a  SINE  to  become  a  functionally-transposing 
unit,  indicating  that  SINE  has  a  potential  to  mobilize  other 
sequences  (reviewed  by  Weiner  et  al.  1986) .  There  is  one 
example  of  genomic  non-repetitive  sequence  that  lay  between 
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two  artiodactyl  SINEs  retroposed  with  them  as  a  unit, 
resulting  in  the  duplication  within  the  cow  haploid  genome 
(Zelnick  et  al.  1987). 

In  vitro  transfection  experiments  also  indicated  that 
SINEs  might  repress  or  activate  transcription  initiated  by 
adjacent  RNA  polymerase  II  promoter  (McKinnon  et  al.  1986) . 
Another  function  conferred  by  certain  SINEs  is  to  encode 
portion  of  polypeptides.  Alu  dispersed  repeats  constitute  for 
32  codons  of  3'  portion  of  genes  for  decay-accelerating  factor 
and  for  a  B-cell  growth  factor  (Caras  et  al.  1987;  Sharma  et 
al.  1987) .  The  CCAAT  box  of  the  e-globin  gene  in  primates  is 
part  of  an  Alu  repeat  sequence  (Kim  et  al.  1989) .  Some  SINEs 
are  found  in  the  3 '  noncoding  exons  and  provided 
polyadenylation  signal  (Krane  &  Hardison.  1990) .  Thus, 
functional  sequences  provided  by  SINE  include  promoter,  RNA 
processing  and  protein-coding  sequences. 

Evolution  of  Introns  "  *  ^ 

....  .  -  -■'<( 

Mammalian  genes  are  discontinuous,   broken  up  along  the 

DNA  into  alternating  regions:  coding  sequence  or  exons,  which 

are  interspaced  with  other  noncoding  secfuences  or  introns  that 

will  be  spliced  out  of  the  primary  transcript.    An  intriguing 

question  regarding  the  introns  is  what  advantages  or  functions 

are   provided   to   the   cell   by   them.      There   has   been  ample 

speculation  about  the  origin  and  maintenance  of  introns  in 
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eukaryotic  genomes.  Gilbert  (1978,  1985)  proposed  "exon 
shuffling"  hypothesis  which  states  that  introns  provide  an 
evolutionary  advantage  by  allowing  recombination  within  intron 
sequences,  and  that  introns  in  modern  genomes  were  remnants 
of  the  recombination  process  that  speed  up  evolution.  The 
observations  that  the  exons  often  correlated  with  functional 
domains  and  that  the  homologous  exons  can  be  found  in 
different  genes  have  been  used  to  support  this  idea. 

Examinations  of  genes  coding  for  certain  ubiquitous 
enzymes,  such  as  triosephosphate  isomerase,  whose  sequence  is 
highly  conserved  across  species,  have  revealed  that  the  intron 
positions  are  not  random  and  that  all  of  these  introns  were 
in  place  before  the  division  of  plants  and  animals  (Gilbert 
et  al.  1986) ,  the  introns  were  lost  from  prokaryotes  as  their 
genomes  became  streamlined  for  rapid  DNA  replication 
(Doolittle  1978) .  After  the  discovery  of  introns,  a  number 
of  authors  have  suggested  that  intron  might  represent  the 
vestiges  of  transposable  elements  which  had  been  inserted  into 
the  genes  (Cavalier-Smith  1985;  Hickey  &  Benkel  1986). 
Although  there  is  evidence  that  many,  if  not  all,  introns  are 
dispensable  (Ng  et  al.  1985) ,  there  is  also  evidence  that  the 
internal  sequences  of  introns  are  important  for  splicing 
(Rautmann  &  Breatnach  1985) .  Cech  (1986)  has  suggested  that 
all  RNA  splicing  reactions  are  evolutionarily  related,  with 
the  exception  of  those  involving  some  pre-tRNA.  This 
evolutionary  link  between  different  intron  classes  implies 
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that  the  introns  of  nuclear  protein-coding  genes  were  also 
capable  of  replicative  transposition  at  some  stage  in  their 
evolutionary  history.  Hickey  &  Benkel  (1986)  have  suggested 
a  model  to  account  for  the  evolutionary  origin  of  introns. 
The  main  points  of  this  model  are  summarized  as  follows:  (i) 
Most  present  day  introns  are  the  relics  of  retrotransposons ; 
(ii)  copies  of  transposable  sequence  were  contained  within  the 
RNA  primary  transcript;  (iii)  RNA  splicing  activity  encoded 
by  the  transposable  elements  processed  the  transcripts  into 
exon  and  intron  sequences;  (iv)  the  exons  were  then  available 
for  translated  into  gene  product;  (v)  the  spliced  intron  were 
able  to  be  reversed-transcribed  into  DNA  and  reinserted  into 
else  where  in  the  genome.  Although  Doolittle  (1978)  argued 
that  the  de  novo  insertion  of  introns  into  functional  genes 
would  disrupt  normal  gene  expression  and  thus  would  be 
strongly  selected  against  at  the  organismic  level,  it  was 
proposed  that  the  RNA  splicing  might  function  solely  to 
counteract  the  potential  negative  effect  of  introns  (Hickey 
&  Benkel  1986) .  A  common  property  shared  by  all  introns  is 
their  removal  from  primary  transcripts  by  splicing.  Numerous 
evidences  have  indicated  that  the  splicing  activity  is 
controlled  by  introns  themselves.  For  instances,  some  fungal 
mitochondrial  group  I  and  II  introns  can  undergo  self-splicing 
which  depends  on  the  structure  of  RNA  transcripts  and  can 
propagate  themselves  by  insertion  into  genes  (reviewed  by 
Lambowitz   1989) .      Genetic  analysis  of  mitochondrial  system 
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also  indicated  that  in  vivo  self -splicing  depends  on  so- 
called  maturase,  some  of  which  are  encoded  by  the  intron 
themselves.  All  characterized  maturase  function  only  in 
splicing  the  intron  in  which  they  are  encoded  or  closely 
related  intron.  It  has  been  proposed  that  the  nuclear  pre- 
mRNA  intron  have  evolved  from  self- inserted  group  II  intron 
(Roger  1989)  (Figure  2-12) .  Once  an  intron  is  inserted,  it 
might  take  only  a  single  base  change  to  convert  the  group  II 
intron  into  classical  intron.  Now  both  types  of  introns  have 
similar  consensus  sequences. 

Wild  Mice  As  a  Useful  Genetic  Tool 

Part  of  the  goal  of  this  dissertation  is  to  determine  the 
distribution  of  evolutionary  lineages  of  the  class  II  Ab  gene 
in  the  genus  Mus  and  to  determine  how  long  these  lineages  have 
persisted  in  Mus  during  the  evolution  of  Ab  genes.  Previous 
studies  of  the  evolution  of  Mhc  class  II  genes  were  limited 
in  the  number  of  species  examined  and  limited  in  the  number 
of  strains  tested.  In  this  dissertation,  we  have  extended 
the  previous  study  by  including  twelve  species  and  subspecies 
of  genus  Mus  and  the  115  H-2  haplotypes  extracted  from  them. 

The  "house  mouse",  has  become  the  most  studied  animal  of 
laboratory  research  probably  because  its  habitat  is  closest 
to  that  of  man.  It  has  been  known  for  some  time  that  the 
major  laboratory  inbred  strains  are  derived  from  common 


Figure  2-12.  Proposed  sequence  of  events  that  a  group  II 
intron  could  mutate  into  a  classical  intron.  Adapted  from 
Roger  (1989) . 
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ancestors  (Morse  1978) .  Study  of  mitochondrial  DNA  has 
indicated  that  most  laboratory  inbred  strains  belong  to  the 
Mus  musculus  domesticus  type  (Ferris  et  al.  1982)  .  On  the 
contrary,  using  a  Y-specific  DNA  probe  has  revealed  that  the 
Y  chromosomes  of  most  of  laboratory  inbred  strains,  except 
SJL,  is  of  M.  m.  musculus  origin  (Bishop  et  al.  1985) .  Thus 
the  pool  of  segregating  genes  in  laboratory  mice  is  fairly 
limited  and  probably  does  not  reflect  the  mouse  species  as  it 
is  in  the  wild  (Guenet  1986)  .  In  fact,  had  it  not  been  for 
wild  mice,  the  analysis  of  certain  genetic  loci,  e.g.,  Mta, 
a  maternally  transmitted  histocompatibility  antigen,  would 
have  suffered  premature  termination  (Lindahl  1986) .  Depending 
on  the  degree  of  association  with  humans,  wild  mice  can  be 
distinguished  into  three  groups.  These  are  aboriginal, 
commensal  and  feral.  Aboriginal  mice  live  primarily 
independently  of  human  construction.  Commensal  mice  live  in 
close  association  with  man-made  structure,  and  feral  mice  have 
resumed  an  aboriginal  mode  of  life  from  the  commensal  stage 
(reviewed  by  Sage  1981) .  The  aboriginal  species  include  Mus 
spretus .  M.  spretoides  (M.  macedonicus ;  M.  abbotti) ,  M. 
spicilequs  (M.  hortulanus) .  All  introduced  populations  of  M. 
domesticus  in  the  New  World  and  in  Australia,  which  live  in 
native  vegetation,  are  considered  feral  forms  derived  from 
commensal  ancestors.  Based  on  genetic  variability  of  wild 
mice,  using  both  DNA  and  biochemical  markers,  the  Mus  genus 
can  be  divided  into  the  complex  species  Mus  musculus  and  at 
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least  eight  other  species,  including  Mus  spretus .  M. 
spretoides .  M.  spicileqeus.  M.  cooki.  M.  cervicolory  M. 
pahari .  M.  platythrix  (Bonhoinine  et  al.  1984;  Bonhomme,  1986; 
Avner  et  al.  1988) .  Mus  musculus  complex  species  itself 
consists  of  four  main  biochemical  groups  Mus  musculus 
musculus,  Mus  musculus  domesticus.  Mus  musculus  castaneus.  and 
Mus  musculus  bactrianus.  all  of  which  are  considered  as 
subspecies. 

M.  m.  domesticus  is  present  in  Western  Europe,  the 
Mediterranean  basin,  Africa,  Arabia,  Middle  East  and  has  been 
transported  by  ship  to  the  New  World,  Australia  and 
southeastern  Africa,  leaving  few  regions  of  the  earth  without 
house  mice.  M.  m.  musculus  occurs  in  Eastern  Europe, 
extending  to  Japan  across  USSR  and  North  China.  M.  m. 
bacitrianus  is  distributed  from  Eastern  Europe  to  Pakistan  and 
India.  The  distribution  of  M.  m.  castaneus  ranges  from  Ceylon 
to  South  East  Asia  through  the  Indo-Malayan  archipelago 
(Figure  2-13) .  Even  though  these  four  subspecies  are  quite 
biochemically  differentiated,  they  may  exchange  genes  wherever 
they  come  into  contact  (Bonhomme  et  al.  1984)  .  One  of  the 
best  understood  cases  is  that  between  M.  m.  musculus  and  M. 
m.  castaneus  in  Japan  (Yonekawa  et  al.  1986;  Yonekawa  et  al. 
1988) .  The  Japanese  mouse,  M.  m.  molossinus^  has  long  been 
considered  an  independent  subspecies  of  the  house  mouse. 
However,  the  restriction  enzyme  analysis  of  mitochondrial  DNA 
(mt  DNA)  indicated  that  M.  m.  molossinus  has  two  main  maternal 
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lineages.  One  lineage  is  closely  related  to  the  mtDNA  of  the 
European  subspecies  M.  m.  musculus.  the  other  is  closely 
related  to  the  mtDNA  of  the  Asiatic  subspecies  M.  m. 
castaneus . 

The  three  aboriginal  species,  namely,  M.  spretus.  M. 
spretoid.  and  M.  spicilequs,  may  be  found  in  sympatry  with  M. 
musculus  subspecies.  M.  spretoides  and  M.  spicilegus  probably 
represent  the  best  case  of  sibling  species  thus  far  discovered 
in  mammals.  They  are  very  similar  morphologically  and 
biochemically.  Yet  under  the  laboratory  conditions  they  can 
not  interbreed  (Bonhomme  1986) .  The  mound-building  species, 
M.  spicilegus.  is  found  in  steppe  grasslands  of  the  Carpathian 
basin  and  the  Ukraine.  The  distribution  of  short-tailed  M. 
spretoides  is  limited  to  southeastern  Europe  and  Asia  Minor 
(mainly  eastern  Mediterranean) .  M.  spretus  is  found  existent 
in  the  western  Mediterranean,  from  France  to  Libya  (Figure 
2-14) . 

Europe  is  not  the  homeland  of  the  genus  Mus.  All  of  the 
Mus  species  and  subspecies  that  presently  inhabit  the 
continent  seem  to  have  entered  it  with  man  (Bonhomme  1986)  . 
Certain  members  of  genus  Mus  have  apparently  inhabited  India 
and  Southeast  Asia  since  their  origins.  Three  strictly 
oriental  species,  M.  carol i.  M.  cervicolor.  and  M.  cooki.  form 
a  monophyletic  group  according  to  single  copy  nuclear  DNA  (sen 
DNA)  hybridization  and  mtDNA  data.  Protein  electrophoretic 
data  also  suggest  that  these  three  Asian  species  have 
speciated  almost  simultaneously  (She  et  al.   1990) . 
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In  the  past,  M.  fPyromys)  platythrix  and  M.  (Coelomys) 
pahari  are  considered  as  subgenera  of  Mus  based  on  their 
morphology.  They  are  not  more  related  to  Mus  than  they  are  to 
other  well  defined  Murid  genera.  The  large,  spiny  M. 
platythrix  occur  in  India.  The  large,  shrew-like  Mus  pahari 
is  present  from  Sikkim  to  Thailand.  The  phylogenetic 
relationships  deduced  from  DNA-DNA  hybridization  studies  among 

9  species  and  5  subspecies  within  the  genus  Mus  are  presented 
in  Figure  2-15.  The  %  DNA  divergence  detected  between  the 
various  species  is  shown  on  the  left  axis,  the  estimated  time 
interval  since  genetic  separation  of  their  gene  pools 
(speciation)  is  listed  on  the  right.  Similar  phylogenetic 
relationships  are  obtained  when  these  species  are  compared  by 
other  techniques,  such  as,  protein  polymorphisms,  mitochondria 
DNA  sequence  divergence  (She  et  al.  1989) .  However,  estimates 
of  the  genetic  distance  among  Mus  species  will  vary  depending 
on  the  techniques  employed  (She  et  al.  1989)  .  There  are  seven 
levels  of  divergence  among  these  species,  ranging  from  0.3  to 

10  million  years  (Luckett  &  Hartenbege  1985) . 
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CHAPTER  3 
MATERIAL  AND  METHOD 

Wild  Mice 

The  wild  mouse  strains  used  in  this  study  are  listed  in 
Table  3-1  and  were  kindly  provided  by  Dr.  Franciose  Bonhoimne. 
Geographic  origins  of  these  mouse  strains  are  also  included. 
The  distribution  patterns  of  these  wild  mice  indicate  that 
they  are  representative  of  the  global  mouse  population. 

Source  of  Mouse  Tissues  and  Preparations  of  DNA 

Tissue  samples,  such  as  livers  and  kidneys,  were  used  for 
the  isolation  of  genomic  DNA.  Tissue  samples  from  different 
mouse  strains  were  minced  and  preserved  in  75%  ethyl  alcohol, 
according  to  the  method  described  by  Smith  et  al.  (1987) . 
Genomic  DNA  was  isolated  from  tissues  by  the  proteins  K/sodium 
dodecyl  sulfate  (SDS)  as  detailed  in  Sambrook  et  al.  (1988) . 
Minced  tissues  are  washed  with  PBS  once,  transferred  to  a 
liquid  nitrogen-cooled  mortar  containing  liquid  nitrogen,  and 
ground  into  fine  powder.  The  frozen  powder  was  added  to  TES 
buffer  (lOmM  Tris-HCl,  PH  7.5;  5  mM  ethylenediaminetetraacetic 
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Table  3-1.  Geographic  Origin  and  Distribution  of  Mouse  Strains 


STRAIN 


SPECIES 


GEOGRAPHIC 
ORIGIN 


MAI 
MBB 
MBK 
MBS 
MBT 
MDL 
MDS 
MPW 
MYL 
MOL 
CAS 

SEI 
SEG 
SPE 
SET 
SFM 
SMA 
STF 
XBJ 
XBS 

ZBN 
ZRU 
ZYD 
ZYP 


Mus  musculus 

Mus  m.  m. 

Mus  III.  m. 

Mus  m.  m. 

Mus  m.  m. 

Mus  m.  m. 

Mus  m.  in. 

Mus  iti.  m. 

Mus  m.  m. 


musculus 


Mus  musculus 
Mus  musculus 

Mus  spretus 
Mus  spretus 
Mus  spretus 
Mus  spretus 
Mus  spretus 
Mus  spretus 
Mus  spretus 
Mus  spretoides 
Mus  spretoides 

Mus  spicilegus 
Mus  spicilegus 
Mus  spicilegus 
Mus  spicilegus 


molossinus 
castaneus 


Austria 

Bulgaria 

Bulgaria 

Bulgaria 

Bulgaria 

Denmark 

Denmark 

Poland 

Yugoslavia 

Japan 

Thailand 

Spain 

Spain 

Spain 

Spain 

France 

Monaco 

Tunisia 

Bulgaria 

Bulgaria 

Bulgaria 
U.S.S.R. 

Yugoslavia 
Yugoslavia 


KAR 
COK 
CRV 
CRP 
PAH 
PTX 


Mus  carol i 
Mus  cookii 
Mus  cervicolor 
Mus  cericolor 
Mus  pahari 
Mus  platythrix 


cervicolor 
popaeus 


Thailand 
Thailand 
Thailand 
Thailand 
Thailand 
India 
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acid  (EDTA) ,  lOOmM  NaCl)  with  1%  SDS  and  0.4  mg/ml  proteinase 
K,  which  inactivates  and  digests  the  proteins,  facilitating 
the  isolation  of  DNA.  This  solution  was  incubated  at  65°C 
overnight.  The  digested  DNA  solution  was  extracted  three 
times  with  Tris  equilibrated  phenol  (PH  7.5),  twice  with 
chroloform/amyl  alcohol  (24:1)  ,  and  precipitated  with  an  equal 
volume  of  isopropyl  alcohol.  The  DNA  was  fished  out  by  a 
pasteur  pipet  and  resuspended  in  TE  (lOmM  Tris  (hydroxy Imethyl) 
aminomethane-HCl ,  PH  7.5,  ImM  EDTA).  The  resulting  DNA 
solution  was  quantitated  by  spectrophotometry  and 
electrophoreses  on  0.7%  agarose  gels  to  confirm  their  high 
molecular  weight.  Alternatively,  genomic  DNA  was  isolated 
using  an  automated  Nucleic  Acid  Extractor  (Applied  Biosystems 
340A) ,  following  manufacture ' s  instruction.  Briefly,  ground 
fine  tissue  powders  were  suspended  in  3  ml  of  lysis  buffer 
(Applied  Biosystems),  and  0.3  ml  of  Proteinase  K  (Applied 
Biosystems)  was  added.  The  digested  tissue  was  extracted  with 
phenol/chloroform  (50/50,  v/v)  to  remove  the  digested 
proteins.  The  DNA  was  precipitated  from  the  solution  by 
adding  sodium  acetate  (to  a  final  concentration  300  mM)  and 
2  volumes  of  ethanol  (95%) .  Precipitated  DNA  was  air-dried 
and  resupended  in  TE  buffer. 
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Restriction  Enzyme  Digestion  and  Agarose  Gel  Electrophoresis 

Restriction  enzymes  (Bgl  II,  BamH  I,  Eco  RI,  Hind  III, 
Pst  I,  Pvu  II,  SSt  I)  were  obtained  from  Bethesda  Research 
Laboratories  (BRL)  .  Restriction  enzyme  digestions  were  carried 
out  for  about  18  hr.  in  a  volume  of  90  or  180  ul  (microliter) 
containing  20  ug  of  DNA  and  4  units  of  enzymes  per  ug 
(microgram)  of  DNA,  under  the  conditions  specified  by  the 
supplier  (Bethesda  Research  Laboratories,  Bethesda,  Maryland)  . 
Completeness  of  digestions  was  monitored  by  using  Lambda  DNA 
coincubated  with  aliquots  of  the  DNA  samples.  Briefly,  0.5 
ug  of  lambda  DNA  was  added  to  one-tenth  volume  of  reaction 
mixture  and  at  the  end  of  incubation  period  was 
electrophoresed  on  a  agarose  gel.  Characteristic  restriction 
patterns  of  lambda  DNA  and  a  homogenous  smear  of  genomic  DNA 
are  indicative  of  complete  digestion.  In  the  case  of  double 
digestion,  the  digestion  was  first  performed  with  enzymes 
requiring  low  concentration  of  salt.  After  the  completeness 
of  first  digestion,  the  digests  were  adjusted  for  the  content 
of  salt,  subsequently,  the  buffer  necessary  for  the  second 
enzyme  and  enzyme  were  added.  For  convenience,  the  volume  of 
double  digestion  were  reduced  by  alcohol  precipitation  before 
loading  into  the  gel.  Briefly,  one-tenth  volume  of  3  M  sodium 
acetate  was  added  to  the  digest,  subsequently,  2  vol.  of  95% 
alcohol  were  added,  and  stored  at  -70°  C  for  30  min.  The 
precipitate    was    recovered    by    spinning    at    12,000    x    g  in 
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microfuge  for  20  min.  and  washed  with  70%  cold  ethyl  alcohol. 
Later,  the  precipitate  was  dried  and  resuspended  in  80  ul  of 
TE.  The  digests  were  subjected  to  electrophoresis  in  0.7% 
agarose  gels  for  16  hours  at  3  V/cm  in  a  water-cooled 
electrophoresis  apparatus  (International  Biotechnologies 
Incorporated,  New  Heaven,  Conneticut) . 

Probes 

A  5.8  kb  Eco  RI  fragment  containing  Ab**  genomic  probe  was 
kindly  provided  by  Dr.  Leroy  Hood.  A  369  bp  Eco  RI-Hind  III 
fragment  and  a  911  bp  Hind  III-Eco  RI  fragments  of  DNA  were 
generated  from  5'  and  3'  regions,  respectively,  of  Ab**  genomic 
probe  and  subcloned  into  pUC19   (Figure  3-1) . 

Capillary  Transfer  and  Hybridization 

The  restriction  enzyme  digested  DNA  was  transferred  from 
gel  to  Zetabind  membrane  (Microf iltration  Products  Division, 
Meriden,  Conneticut)  by  Southern  blotting  (1975)  according  to 
manufacturer's  instruction.  The  agarose  gel  was  denatured  in 
0.2N  NaOH,  0.6M  NaCl  for  30  min  at  room  temperature  and  then 
neutralized  by  0.5M  Tris  pH  7.5,  1.5M  NaCl  for  30  min  at  the 
same  temperature.  After  blotting,  the  membranes  were  washed 
with  2  X  SSC  (1  X  SSC  =  0.15  M  NaCl,  0.015  M  NaCgH^Og)  to 
remove  agarose  residue  and  then  washed  in  O.IX  SSC,   0.5%  SDS 
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for  1  hr  at  65°C  shaking  water  bath  to  reduce  the  background. 
Subsequently,  the  membranes  are  either  dried  in  a  80°  C  vacuum 
oven  for  3  hours  or  at  room  temperature  until  further  use. 
A  Hindi I I -cut  Lambda  DNA  was  included  on  every  gel  for  use  as 
a      molecular-weight      standard.  Prehybridization  and 

hybridization  of  the  membranes  are  carried  out  as  instructed 
by  manufacturer  (AMF,  Meriden,  CT) .  The  blots  were  hybridized 
with  ^^P-labeled  DNA  probe  with  a  specificity  of  approximately 
2  X  10®  dpm/ug  by  primer  extension  (Bethesda  Research 
Laboratory,  Bethesda,  MD)  for  overnight  at  42°C. 
Nonspecif ically  bound  probe  was  removed  by  two  successive 
washes  in  0.1  x  SSC/0.1  %  SDS  at  65°  C  shaking  water  bath.  The 
blots  were  then  exposed  to  XAR-5  X-ray  film  (Kodar,  Rochester, 
NY)  using  Cronex  Lightening-Plus  intensifying  screens  (Dupont, 
Wilmington,  Delaware) .  Alternatively,  the  DNA  was  blotted  to 
GeneScreen  membrane  (Du  Pont,  NEN  Product,  Boston,  MA) .  Using 
this  membrane,  the  gel  was  depurinated  in  0.25N  HCl  for  lOmin 
and  then  denatured  in  0.2N  NaOH,  0.6M  NaCl  before  blotting. 
After  the  DNA  was  transferred  onto  the  membrane,  the  membrane 
was  dipped  in  0.4N  NaOH  for  30-60  seconds  to  insure  the 
complete  denaturation  of  DNA.  Then,  the  membrane  was 
neutralized  in  2X  SSC  adjusted  with  Tris  buffer  (PH  6.0)  for 
30-60  seconds.  Subsequently,  the  DNA  was  UV  cross-linked  to 
the  membrane  for  1.5  min.  The  pre-hybridization  and 
hybridization  was  carried  out  in  solution  containing  1% 
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crystalline  grade  bovine  serum  albuinin/0.5  mM  EDTA/0.5  M 
NaHPOi,  pH  1.2/1%  SDS  (Church  &  Gilbert  1984). 


Genomic  Restriction  Mapping 


To  construct  the  restriction  map,  after  autoradiography, 
the  blots  were  stripped  of  the  genomic  Ab**  probe  by  washing 
in  0.1  X  SSC  and  0.1%  SDS  at  80°  C  for  20  min.  and  re- 
hybridized  with  labeled  5'  and  3'  regions  of  Ab*^  probe, 
respectively.  The  fragments  obtained  from  each  region  of 
hybridization  were  used  to  orient  the  restriction  sites.  All 
unique  alleles  were  characterized  by  double  digestion  to 
confirm  the  results  of  restriction  mapping  by  the  above 
method.  In  some  cases,  the  fragment  sizes  were  assigned  to 
either  allele  in  Ab  heterozygotes  according  to  restriction 
patterns  of  known  alleles.  To  facilitate  comparisons  among 
different  alleles,  a  prototypical  allele,  BIO.D  (d  haplotype, 
lineage  1),  C57BL/10  (b  haplotype,  lineage  2),  BIO. BR  (k 
haplotype,  lineage  3) ,  from  each  lineage  was  included  on  each 
gel  of  restriction  analysis. 


Nucleotide  Sequencing 


A  recombinant  plasmid  pI-Ab'^-gpt-l  containing  the  entire 
Ab*"  gene  plus  flanking  sequence  was  kindly  supplied  by  Dr. 
Ronald  N.  Germain.     A  9.3  kb  Hind  III-Eco  RI  fragment  from 
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this  plasmid  was  subcloned  in  PUC  19  (PUC-K-9.3). 
Subsequently,  both  the  1.9  kb  Pvu  Il-Sst  I  and  the  1.7  kb 
Sst  I  fragments  (derived  from  PUC-K-9.3)  covering  the  5'  and 
3'  portions  of  intron  2  of  Ab*"  gene  were  subcloned  in  PUC19, 
M13mpl8  and  M13mpl9,  respectively  (Figure  3-2).  As  the  1.9 
kb  Pvu  Il-Sst  I  fragment  cloned  into  M13  was  frequently 
deleted  for  various  lengths  due  to  the  repetitive  elements, 
it  was  cloned  into  Pbluescript  SK(+)  and  Pbluescript  KS(+)  as 
well.  The  nucleotide  sequences  of  both  1.9  kb  PvuII-SstI  and 
1.7  kb  SstI  fragments  were  determined  by  Sanger's 
dideoxynucleotide  termination  method  in  both  orientations 
using  Sequenase  (United  States  Biochemical  Corporation, 
Cleveland,  Ohio)  according  to  manufacture's  instruction 
without  modification.  Ambiguities  were  eliminated  either  by 
substituting  dGTP  with  7-deaza-dGTP  or  by  using  Tag  DNA 
polymerase  (United  States  Biochemical  Corporation,  Cleveland, 
Ohio) ,  which  is  performed  at  elevated  temperature  (labelling 
reaction  at  45°C,  termination  reaction  at  70°C)  to  eliminate 
gel  compression. 

Data  Analysis 

RFLP  Patterns  of  Ab  Alleles  and  Their  Phvloaenetic 
Relationships 

To  investigate  the  evolutionary  relationships  of  Ab  genes 
assembled  from  12  different  Mus  species  and  subspecies,  their 
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restriction  maps  were  analyzed  by  parsimony  analysis.  A 
total  of  86  Ab  alleles,  which  were  obtained  separately  from 
this  dissertation,  McConnell  et  al. (1986,  1988),  and  Ying  Ye 
are  included  in  this  analysis.  Restriction  site  polymorphisms 
were  used  to  derive  the  best  fit  of  the  most  parsimonious 
network  that  contains  the  minimum  numbers  of  character  state 
changes  necessary  to  account  for  the  phylogenetic  relationship 
among  the  genes. 

Computer  Programs 

The  computer  programs  used  were  all  from  the  package 
distributed  by  J.  Felsenstein  under  the  name  PHYLIP  3.0. 
These  programs  generate  phylogenetic  trees  and  many  of  them 
use  algorithm  that  are  designed  to  identify  the  tree(s)  that 
incorporate  minimal  convergent  change.  However,  the  programs 
used  are  to  some  extent  dependent  on  the  input  order  of  the 
character  sets,  and  subsequently  must  be  run  repeatedly  with 
the  set  input  in  a  different  order.  As  evolutionary  trees 
were  constructed  by  the  parsimony  method,  only  the  most 
parsimonious  network  requiring  the  minimum  number  of  character 
state  changes  were  displayed.  It  is  noted  that  the 
phylogenetic  trees  constructed  by  this  program  is  unrooted. 
This  analysis  is  based  on  41  variable  sites  recognized  by  the 
7  different  restriction  enzymes,  of  which  29  were 
phylogenetically  informative. 
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Polymerase  Chain  Reaction  (PGR)  Amplification 

Enzymatic  Amplification  of  Genomic  DNA 

Polymerase  chain  reactions  (PGR)  was  performed  with  a 
Geneamp  kit  (Cetus) ,  using  the  recommended  buffer  formulas  and 
modified  conditions.  Samples  were  first  heat-denatured  at  94° 
G  for  1.5  minutes,  then  cooled  down  to  0°  G.  Subsequently, 
DNA  were  subjected  to  3  5  cycles  of  PGR,  each  consisting  of  1 
minute  of  denaturation  at  94°  C,  2  minutes  of  annealing  at 
62°  G,  and  3  minutes  of  polymerization  at  72°  G  with  3  units 
of  Tag  polymerase.  A  typical  PGR  reaction  consisting  of  0.5- 
1  ug  target  DNA  resuspended  in  100  ul  reaction  mixture 
containing  10  ul  of  lOX  buffer (lOx  buffer=  500mM  KGl ,  lOOmM 
Tris-Gl,  PH8.3,  15mM  MgG12,  0.15  (w/v) ) ,  10  ul  of  dNTPs  mix 
(2.0mM  for  each  dNTP) ,  100  pm  of  each  primer,  5  ul  of  dimethyl 
sulfoxide  (DMSO)  and  3  units  of  Tag  polymerase.  Finally,  the 
reaction  mixtures  were  overlaid  with  approximately  60  ul  of 
sterile  mineral  oil  to  prevent  evaporation.  After  PGR 
amplification  ,  One-tenth  of  reaction  mixtures  were 
electrophoresed  in  TBE  buffer  and  visualized  on  ethidium 
bromide-stained  4  %  Nusieve  agarose  gel.  5 'and  3' 
oligonucleotide  primers  (Figure  3-3)  complementary  to 
conserved  regions  flanking  the  174  bps  small  insert  were  used 
to  amplify  106  H-2  haplotypes  in  our  collection.  For 
restriction  enzyme  analysis,  the  amplified  products  were 
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concentrated  by  ethanol  precipitation.  Precipitates  were 
resuspended  in  TE  buffer  and  digested  under  appropriate 
condition. 

Amplification  of  Central  Fracnnent  for  DNA  Hybridization 

To  characterize  the  genetic  nature  of  the  central 
fragment  bounded  by  two  members  of  the  Bl  family  in  the  53  9 
bp  insert  in  lineage  3  alleles,  5'  and  3'  oligonucleotide 
primers  flanking  this  region  of  DNA  was  designed  and  used  to 
amplify  the  plasmid  PUC-K-1.9  encompassing  this  region  of  DNA 
(Figure  3-4  &  Figure  3-5) .  The  amplified  DNA  products  were 
estimated  to  be  235  bp  in  length  and  subsequently,  purified 
from  6%  polyacrylamide  gel.  The  isolated  230  bp  DNA  fragments 
were  radiolabelled  by  primer  extension  and  used  to  hybridize 
the  blots.  ^  ' 
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Figure  3-5.  The  nucleotide  sequence  of  539  bp  insert.  Shaded 
areas  indicate  the  direct  repeats  bordering  the  insert.  The 
two  Bl  family  repeats  at  the  left  and  right  ends  of  the  insert 
are  underlined  and  the  central  fragment  bound  by  two  Bl 
elements  is  double  underlined.  The  5 ' (GCCCCTTTAACTTTTAATAT) 
and  3 • (TGCTCCCAGTCCCAAGGCTTT)  oligomers  used  for  PGR 
amplification  is  shown  by  the  dash  line  over  the  oligomer 
sequences.     The  amplified  product  is  235  bp  long. 
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a!!g!gCgAGACAG(5<5T1?TCTCT<5TGTAGCTCTGGCTGTCCTGGAACTCACTTTGTAGACCAG 
direct  repeat 


GCTGGCCTCGAACTCAGAAATCCGACTGCCTCTGCCTCCCAAGTGCTGGGATTAAAGGCA 

"Alu-like"  repeat (Bl) 

 >5/ 

TGAACCACCACGCCCGGCCCCTTTAACTTTTAATATCCTCTTTGTCTTAAGATGAGTCCA 

Non-repetitive  element 


GGCTGGCCTCCGTTCTCCACAATGCCCCTGCCTCAGCCTCTCATGCTCTCCACAGCAAAG 


CCTATATCCTTTTATGTGAAACATAGGTATATAGTTTAATGTGTTTATTACCTGCAATGG 


3/<  

CTGGGAATGGAACCCAACCAAGGCTTCAAGGCCTCCTTCGGCCAATCTGCTCCCAGTCCC 


AAGGCTTTTTTTTTTTTTTTTTTTTTTCAAGACAGGGTTTCTCTGTATAGCCCTGGCTAT 

"Alu-like"  repeat (Bl) 


CCTGGAACTCACTTTGTAGACCATGCTGGCCTCCAACTCAGAAATCTGCCTGCCTCTGCC 


TCCCGAGTGCTGGGATTAAAGCATGCGCCACCATGCCCGGCTACTTAAATTTTTTTGTTT 


GTTTGTTTGTTTGTCTGTTTGTTTCGAGACAGGGTTTCTCTGY 

direct  repeat 


CHAPTER  4 

SEQUENCE  ANALYSIS  OF  LINEAGE  3  ALLELES 

Restriction  Enzyme  Analysis  of  Lineage  3  Alleles 

Restriction-Site  Polymorphism  of  Lineage  3  Alleles 

In  a  previous  study  of  Ab  genes  of  genus  Mus  by  RFLP 
analysis  (McConnell  et  al.  1988)  using  the  seven  six-cutter 
enzymes:  including  Eco  RI,  Bam  HI,  Sac  I  (Sst  I),  Hind  III, 
Pst  I,  Bgl  II,  Pvu  II,  the  Ab  genes  were  grouped  into  three 
distinct  evolutionary  lineages  based  on  the  extent  of  sequence 
divergence.  Lineage  3  consists  of  four  Ab  alleles,  BIO. BR  (k 
haplotype) ,  BIO. PL  (u)  ,  NZW  (z)  ,  and  B10.CHA2  (w26) .  The 
genomic  restriction  mapping  of  these  alleles  was  first  carried 
out  using  single  restriction  enzyme  digestion,  followed  by 
hybridization  with  5'  and  3'  regions  of  Ab  probe, 
respectively.  To  confirm  the  restriction  mapping,  double 
digest  experiments  was  performed  as  exemplified  in  Figure 
4-1  and  Figure  4-2.  In  this  study,  three  additional  lineage 
3  alleles,  MDLII,  DBVII,  and  DFCII,  were  revealed  by  RFLP 
analysis  using  the  same  seven  restriction  enzymes.  The  RFLP 
patterns  and  restriction  maps  of  these  seven  lineage  3  alleles 
are  shown  in  Table  4-1  and  Figure  4-3.  In  both  BIO. PL  and  NZW, 
there  is  one  small  insertion-deletion  site  (indicated  by  solid 
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Figure  4-1.  Restriction  mapping  performed  by  double  digest 
experiment.  The  restriction  analysis  of  closely  related 
alleles  was  compared  side  by  side. 
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Figure  4-2.  Restriction  mapping  carried  out  by  double  digest 
experiment. 
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Table  4-1.  RFLP  Patterns  of  Lineage  3  Ab  Alleles 


Strain      Pst  I    Eco  RI  Bam  HI  Pvu  II  Sst  I     Bgl  II  Hind  III 


BIO. BR       4.4         15  8.1         4.55       7.8         13  15 

2.6  2.75  4.6 
2.06  1.7 

B10.CHA2  4.4         20  8.4         4.55       7.8         13  15 

2.6  2.75  4.6 
2.06  1.7 

BIO. PL       4.4         15           5.4  4.55  5.2         13  8.5 

2.6  2.75  4.6  7.5 

2.06  2.65 
1.7 

MDL-2         4.4         15           8.4  4.55  7.8         13  15 

2.6  2.75  4.6 

2.06  1.7 

NZW  4.4         15  5.4         4.55       5.2         13  8.5 

2.6         2.75       4.6  7.5 
2.06  2.65 
1.7 

DBV2  4.4         15  8.4         4.55       4.6         13  8.5 

2.0         1.85       2.65  7.5 
1.7 

DFC  2         4.4         15  8.4         4.55       4.6         13  8.5 

2.0         1.85       2.65  7.5 
1.7 
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triangle  in  Figure  4-3) ,  estimated  to  be  about  100  bp  in 
length.  Size  changes  smaller  than  this  were  undetected.  A 
few  lineage-specific  restriction  sites,  denoted  by  encircled 
letters,  were  also  revealed  from  restriction  analysis  (Figure 
4-3)  . 

Distinct  Intron  Size  Between  Lineage  2  and  3  Alleles 

A  comparison  of  genomic  structure  of  one  prototypic 
lineage  2  (b  haplotype)  and  lineage  3  (k  haplotype)  alleles 
is  shown  in  Figure  4-4.  Among  other  differences,  the  major 
characteristic  distinguishing  lineage  2  and  3  alleles  resides 
in  the  intron  separating  A^^  and  A^2  exons.  The  size 
difference  between  these  two  introns  was  estimated  to  be  0.75 
kb  by  comparing  PvuII  fragments  from  Ab**  (3.79  kb)  and  Ab*" (4.6 
kb)  . 

DNA  Sequence  of  Lineage  3  Intron 

To  clearly  define  the  nature  of  the  lineage  3  allele 
intron  between  A^^  and  A^^  exons  and  the  evolutionary 
relationships  among  different  lineage  alleles,  DNA  sequence 
analysis  was  performed.  A  recombinant  plasmid  PI-Ab''-gpt-l 
containing  Ab*"  gene  was  subcloned  and  relevant  regions  were 
sequenced  by  Sanger's  dideoxynucleotide  teirmination  method 
(Sanger  et  al.  1980).  A  total  of  3,735  bp  of  DNA  sequence 
spanning  the  intron  between  A^^  and  A^2/  through  A^2 
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,  If 

exon  and  the  transmembrane  region  of  a  lineage  3  (Ab  )  allele 
was  determined.  The  sequencing  strategy  and  the  3,735  bp  of 
nucleotide  sequence  determined  was  shown  in  Figure  3-2  and 
Figure  4-5,  respectively. 

Lineage  3  Derived  from  Lineage  2 

The  evolutionary  relationships  among  these  3  lineages 
were  assessed  by  comparing  the  published  nucleotide  sequences 
of  lineage  1  (Ab'^)  and  lineage  2  (Ab^)  obtained  from  GenBank 
with  the  lineage  3  (Ab*^)  sequence  determined.  Several  notable 
features  about  lineage  3  intron  were  revealed  from  this 
sequence  analysis  (Figure  4-6  &  Figure  4-7) .  There  are  two 
additional  inserted  DNA  sequences  present  in  lineage  3  allele, 
and  absent  in  lineages  1  and  2.  One  of  these  two  inserted 
sequences  is  174  bp  long  and  its  integration  site  starts  508 
bp  downstream  of  the  A^^  exon  of  Ab*^  and  ends  at  nucleotide 
position  681.  This  small  insert  was  flanked  by  11  bp  direct 
repeats  (ATTCTGATACA) .  The  other  inserted  element  is  539  bp 
in  length,  and  its  integration  site  started  at  1141  bp  3 '  of 
A^,  exon  and  ended  at  1679  bp  and  was  flanked  by  22  bp  direct 
repeats  (TTTCGAGACAGGGTTTCTCTGT) .  Of  great  interest  was  that 
this  large  insert  was  interposed  in  the  861  bp  retroposon, 
distinguishing  lineage  2  from  lineage  1  alleles.  Probably  as 
a  result  of  this  insertional  event,  there  is  a  deletion  of  130 
bp  in  the  861  bps  retroposon.    The  618  bp  of  retroposon  which 
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are  retained  in  lineage  3  allele  share  89%  sequence  identity 
with  the  retroposon  sequence  of  lineage  2  (Figure  4-8)  , 
indicating  that  lineage  3  allele  is  derived  from  lineage  2. 
The  nature  of  retroposon  insertion  as  shown  by  the  generation 
of  a  direct  repeat  bordering  the  inserted  sequence 
demonstrates  again  that  the  lineage  3  allele  is  generated  from 
lineage  2.  The  result  of  this  sequence  analysis  ,  including 
the  relative  location  of  various  retroposon  insertions  and  the 
percentage  of  nucleotide  sequence  homology  from  corresponding 
region,   is  summarized  and  shown  in  Figure  4-9. 

Bl  family  repeats  in  lineage  3  alleles 

A  comparison  of  the  174  bp  inserted  sequence  with  DNA 
sequences  from  GenBank  indicates  that  it  is  highly  homologous 
to  the  Bl  family  of  Alu-like  repeat  of  rodent  (Krayev  et  al. 
1980).  It  is  characterized  by  an  A-rich  tract  at  its  3'  end 
and  contains  putative  RNA  polymerase  III  promoters  as 
indicated  by  box  (Figure  4-6  and  Figure  4-10) .  A  consensus 
RNA  pol  III  promoter  sequence  compiled  by  Galli  et  al.  (1981) 
from  functional  tRNA  and  ribosomal  RNA  genes  is  shown  on  the 
top  of  the  box.  There  are  also  another  two  members  of  Bl 
family,  identified  by  sequence  analysis,  at  both  the  left  and 
the  right  ends  of  the  large  539  bp  insert.  However,  these  two 
Bl  family  repeats  do  not  have  terminal  direct  repeats. 
Interestingly,  the  first  16  residues  of  left  end  Bl  repeat 
also  form  part  of  direct  repeat  flanking  this  large  insert. 
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Ab^  135  Limits:  5190-5789 

Ab  Limits:  1702-2301 

5190 

ATAGCCCTGGCTGTCCTGGAACTCACTCGGTAGACCAGGCTGGCCTCGAACTCAGAAATC 

I  !!!!  I  !'!!!!!!!!!  MM  M  I  I  I  i  i  i  i  i  i  i  i   i  i  i  i  i  i  i  mm 

irmimiJL  M  M  M  I  MM  I    M  M  M  I  MM 

ATAGCCCTGGCTGTCCTGGAACTCACTCGGTAGA  CA  GATGGCCTC  AACTCAG  AATC 
1702 

CACCTACCTCTGCCTCCCGAGTGCTGGGAGTAAAGGTGTGCACCACCACTGCCCGGCGAA 

I  I  I  I  I   I  I  I  I  I  I   !  !  !  !  !   !  !   !  !  >   <  M   m  m  m  m  m  i  m   i  i 

I  M  M^l  I  I  M  I   I     Ml     M  M  M  M  M  M  I     Ml    II 

CACCTGCCTCTGACTCCCAAGAGCTAGGATTAAAGGTGTGCACCATCACCACCCGGCTAA 

ACATTTTAATAGATATTTTCTTCATTTACATTTCAAATGCTATCCCAAAAGTCCCCTATA 

I     I  !  I  I  I   I  I  I  I  !!!!!!!!!!!!!!!!!'<  <  ii  ii  ii  ii  i  ii  ii  ii 

M  I  I  II  M  M  II  II  II  II  II  I  II  II  II  I  I  II  II  I  II  II  I  II  II  II  I  II  II  I 
ATTTTTTATTAGATATTTTCTTCATTTACATTTCAAATGCTATCCCAAAAGTCCCCTATA 

CCCTCCTCCCCCGCACCGCCCTGCTCCCCCTACCCACCCACTCCCACTTTTTGGCCCTAG 

!  M  I  II  II  II  II  I  I  I  II  II  II  II  II  II  II  M  I  III  II  II  II  II  I 
"  '        I  M      II   I  I  I  I  II  II  I  I  I  I  M  I  I  I  I  I  I 

CCCAC  CCACCCTGCT  CCCCTACCCACCCACTCCCGCTTCTTGGCCCTGG 

CGTTCCCCTGTACTGGGGCATATAAAGTTTACAAGACCAAGGGGCCTCTCTCCCCAATGA 

j    I  I  I  I  I  I  I  I  I  !'!!!!!!!!!!!  M  II  I  i  ii  i  ii  i  ii  i  ii  ii  ii  ii  i 

Ml  I  I  Ml  I  II  II  II  I  MM      II  II  II  

CATTCCCCTGTACTGGGGCATATAAAGTTTACAAGACCAA  GGGCCTCTCTCCCCAATGA 
TGGC  TGACTAGGCCATCTTCTGCTACATATGCAGCTAGAGACACGAGCT  CTGGGGGTA 

I'll   I  I  I  I  I   11   I  I  I  I  I  !!!!!!!!!!!  I  I  '  >   m  ii  i  i  ii  i  i  i  mm  m 

l/lJLi^JL-i  il  '  '   M  11  II  II  II  II  II  I  11  II  I   I  II  II  II  II  II  II  I  I   I  I  II  II  II 

TGGCTTGACT  GGTCATCTTCTGCTACATATGCAACTAGAGACACGAGCTCCTGGGGATA 

CTCGTTAGTTCATATTGTTGTTCCACCTATATGGTTGTAGACCCCTTCAGCTCCTTGGGT 
!     !!!!!'    I  M  I  II  II  II  II  II  I  II  II  I     I  I  II    II  I  I  II  I  II  I  I  II  I  II II II II 

II    II   I   I   I   II    II    II    II   I   I   I  II    II       II    I    I   I   I  I 

TTGATTAGTTTATATTGTTGTTCCACCTATAGAGTTGCAGACCCCTTCAGCTCCTTGGGT 
ACTTTCTCTAACTCCTCCATTGGGGGCCCTGTGTTCTATCCTATAGATGACTGTGAGCAT 

I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  11  11  II  M  I!  M  M  I  M  II  I  I  I  I 

I  MM  II  II  II  II  I  II  I  I  I  II  II  II  II  II  I  II  II  II      II  II  II  I  II  I  II  I  II  I  I  II  II  II 

ACTTTCTCTAACTCCTCCATTGGGGGCCCTGTGTTCCATCCTATAGATGACTGTGAGCAT 
CCATTTCTGTATTTGCCAGGCACTGGCATAGCCTCA  CAGGGTCC 

!  M      II  II  II  II  I  I  I  I  II  II      I      I      II  II  I  I  II  II  I  M  I  M  I  I  I 

"'"Mill  II  I      I      I      II  I  I  I  I  I  I  I  I  I 

CCACTTCTGTATTTGCCAGGTA  TTGCATAGCCTCACAAGAGACAGTTATATCAGGGTCC 

TTTCAGCATAATTTTGCTGGCATATGCAATAGTGTCTGCGTTTGGTGGCTGATTATGGGA 

I  I  I  i  i  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I  I!  M  !!  I  M  I  M  M  II  II  I  II  II  II  I  II  I  I  II  I  II  II  II 

JLii  '"'''''''  II  II  II  I  I  M  M  II  II  I  I  I  II  II  I  I  II  I  II  I  II  II  II  I  I 
TTTCAGCATAATTTTGCTGGCATATGCAATAGTGTCTGCGTTTGGTGGCTGATTATGGG^ 

TGGATCCCCGGGTGGGGC 

II  M  II  II  II  II  I  I  II  I  I 
M  I  I  I  I  II  II  II  II  II  II 

TGGATCCCCGGGTGGGGC 

Matches  =  548        Mismatches  =  34  Unmatched  =  36 

Length  =  618  Matches/ length  =     88.7  percent 
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Sequence  data  also  indicates  that  the  putative  RNA  polymerase 
III  split  promoters  can  be  recognized  in  these  two  members  of 
Bl  family  (Figure  4-10) .  It  is  worth  noting  that  the 
transcriptional  direction  of  the  latter  two  Bl  family  repeats 
is  opposite  to  that  of  Ab''  gene  (Figure  4-7)  . 

An  alignment  of  these  three  Bl  family  repeats  identified 
in  these  two  inserted  sequences  with  the  Bl  family  consensus 
sequence  (Kalb  et  al.  1983;  King  et  al.  1986)  is  shown  in 
Figure  4-10.  The  sequence  homology  ranges  from  97%  to  93%, 
with  the  Bl  member  in  the  small  insert  having  the  highest 
(97%)  and  Bl  member  of  the  right  end  of  the  large  insert  being 
the  lowest  (93%) .  Most  of  the  sequence  divergence  is  due  to 
single  base  substitution.  Mismatches  between  the  putative  RNA 
PolII  split  promoter  and  consensus  sequences  are  designated 
by  asterisks  (Figure  4-10;  Galli  et  al.  1981) .  The  structure 
and  sequence  of  the  539  bp  insert  was  analyzed  in  further 
detail . 

The  539  bp  insert  defines  a  new  family  of  murine  repeat 

In  order  to  understand  the  genetic  nature  of  the  central 
fragment  of  this  539  bp  inserted  element,  an  extensive 
computer  search  of  DNA  sequence  library  of  GenBank  was 
undertaken.  No  homologous  sequences  have  been  found.  To 
determine  the  genomic  distribution  of  the  core  portion  of  the 
539  bp  insert,  a  DNA  fragment  of  235  bp  confined  within  the 
middle  portion  of  the  insert  was  amplified  by  PGR  and 
hybridized  to  restriction  enzyme  digested  genomic  DNA.  The 
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pair  of  oligomers  (5'  GAAATCCGACTGCCTCTGCC  3',  5' 
TGCTCCCAGTTCCCAAGGCTTT  3')  used  to  amplify  and  the  resultant 
length  of  amplified  products  as  well  as  it  nucleotide  seguence 
are  shown  in  Figure  3-4.  The  results  of  the  Southern  analysis 
are  shown  in  Figure  4-11  and  Figure  4-12.  Surprisingly,  the 
hybridization  of  the  isolated  2  35  bp  fragment  gave  a  distinct 
band  pattern  in  all  of  strains  studied.  As  expected,  the  size 
of  one  of  the  two  bands  in  lineage  3  alleles,  e.g.  BIOPL,  NZW 
was  consistent  with  their  genomic  restriction  maps.  To  locate 
their  positions  in  genomic  structure,  the  same  membrane 
hybridized  with  a  Ab**  probe  was  also  included  for  clarity 
(Figure  4-11) .  It  is  worth  mentioning  that  the  hybridized 
bands  are  polymorphic  among  all  three  lineage  alleles  studied 
(Figure  4-11) .  This  result  suggests  that  this  539  bps 
inserted  seguence  belongs  to  a  new  family  of  repeated 
seguences.  Since  the  core  portion  did  not  display  evidence 
of  integration,  it  is  likely  that  the  core  portion  and  its 
adjacent  Bl  family  repeats  transpose  as  a  single  unit,  and  the 
22  bp  host-derived  repeats  are  generated  as  a  conseguence  of 
this  insertion  event. 


Ab  Genes  Can  Be  Divided  into  4  Lineages 


Defining  Evolutionary  Lineage  23 

Although  the  DNA  seguence  analysis  of  lineage  3  allele 
(Ab*")  clearly  indicates  that  lineage  3  is  derived  from 
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lineage  2  by  two  additional  insertional  events,  it  is 
unlikely that  these  events  occurred  in  the  same  region 
simultaneously.  As  one  of  the  inserted  sequenced  (Bl  repeat) 
is  only  174  bp  in  length,  it  is  tempting  to  speculate  that  its 
insertion  is  beyond  detection  on  a  0.7  %  agarose  gel.  To 
examine  this  possibility,  PGR  technique  is  exploited  to 
amplify  the  genomic  DNA  using  a  pair  of  synthetic  oligomers 
( 5  •  CCTTGAGGGCCACGGTTGTC  3  '  ,  5  '  GATACCCCCAGAGCCTCTCA  3  '  ) 
(Figure  3-3) .  The  rationale  for  this  PGR  experiment  is  as 
follows:  any  allele  that  contains  this  174  bp  Bl  family  repeat 
will  be  amplified  as  375  bp  fragment,  while,  alleles  without 
this  insert  will  display  a  192  bp  fragment  (Figure  3-3) .  A 
total  of  106  H-2  haplotypes  were  tested  by  PGR  amplification. 
A  panel  of  DNA  samples  representing  the  different  species  and 
subspecies  of  genus  Mus  amplified  by  PGR  were  run  on  a  4% 
Nusieve  agarose  gel  (Figure  4-13) .  The  results  of  these 
experiments  can  be  summarized  as  follows:  First,  as  expected, 
all  lineage  3  alleles,  including  BIO. BR,  AKR,  B10.GHA2, 
BIO. PL,  NZW,  MDLII,  DFGII ,  DBVII  amplify  a  band  around  375  bp 
on  a  4%  Nusieve  agarose  gel  (Figure  4-14) .  Gertain 
recombinant  inbred  strains,  e.g.  BIO.MBR,  B10A(4R) ,  BIO.TL 
exhibit  a  375  bp  band  as  well  (Figure  4-14) .  The  outcome  of 
these  recombinant  inbred  strains  is  not  unexpected  as  these 
recombinants  contain  I -A  subregion  derived  from  lineage  3 
alleles,  specifically  from  k  haplotype.  All  lineage  1  and  2 
alleles,  with  the  exception  of  one  allele,  MBBII,  exhibit  a 


Figure  4-13.  PGR  amplification  of  DNA  samples  from  12  species 
and  subspecies  of  Mus.  H:  lambda  Hind-digested  lambda 
markers,  P:  Pst  I-digested  lambda  markers,  Kb:  kiolbase 
markers,  m.  dom. :  M.  m.  domesticus,  m.  mus.:  M.  m.  musculus. 
spretus:  M.  spretus.  sptd:  M.  spretoid.  spic:  M.  spicilegus, 
caroli:  M.  caroli.  cooki:  M.  cooki.  cerv:  M.  cervicolor 
cervicolor .  cerp:  M.  cervicolor  popeaus,  pahari:  M.  pahari, 
plat:  M.  platyhrix. 
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Figure  4-14.  PCR  amplification  of  DNA  samples  from  lineage  3 
alleles  and  recombinant  inbred  strains. 
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amplified  DNA  fragment  of  approximate  192  bp.  Unexpectedly, 
MBB  II  is  a  lineage  2  allele  identified  by  RFLP,  and  yet  it 
apparently  contains  the  174  bp  insert  (Bl  repeat)  in  the 
corresponding  region  as  lineage  3  alleles  does.  In  fact, 
before  the  PGR  experiments  were  ever  completed,  the  Southern 
blot  analysis  and  the  restriction  mapping  already  indicated 
the  unusual  SStI  restriction  fragment  of  MBBII  (2.3  kb  vs  2.1 
kb)  (Figure  4-15  &  Table  5-1) .  To  confirm  that  this  lineage 
2  allele  (MBBII)  contains  this  Bl  family  repeat,  the  PCR- 
amplified  product  was  isolated  and  subjected  to  restriction 
enzyme  analysis.  The  results  of  this  restriction  analysis  are 
shown  in  Figure  4-16.  Four  DNA  samples,  k  haplotype  (lineage 
3),  d  haplotype  (lineage  1),  MBB,  MBS,  crucial  to  this 
analysis  were  included  in  this  experiment.  MBB  DNA  sample  was 
heterozygous  with  respect  to  lineage  1  and  lineage  2,  and  MBS 
heterozygous  for  lineage  1  and  3  (Figure  4-15  &  Table  5-1) . 
A  conserved  Hinc  II  site  , found  in  lineage  1  and  2  but  not  in 
lineage  3,  would  display  two  bands,  90  bp  and  100  bp, 
respectively,  upon  digestion  (Figure  3-3).  However,  the 
restriction  analysis  clearly  point  to  the  absence  of  HInc  II 
in  MBBII  allele.  Moreover,  the  Hinf  I  site  conserved  in  all 
three  lineages  is  also  identified  in  MBB  II  allele  as  shown 
by  the  production  of  two  fragments,  12  0  bp  and  255  bp,  in 
length  upon  digestion.  Taken  together,  the  findings  of  this 
analysis  demonstrate  that  although  MBBII  allele  belongs  to 
lineage  2,   it  does  contain  the  174  bp  insert  in  the 
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Figure  4-16.  Restriction  analysis  of  PCR-amplif ied  products. 
Letter  designationa  are  as  follows:  d:  lineage  1  (Ab  )  ,  k: 
lineage  3  (Ab'^)  ,  MBB  and  MBS  are  heterozygous:  lineage  1,  2 
and  lineage  2,  3,  respectively.  H:  Hind  Ill-digested  lambda 
markers,  P:  Pst  I-digested  lambda  markers,  Kb  Ladder:  kilobase 
markers . 
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corresponding  region  of  lineage  3  allele.  As  a  consequence 
of  these  finding,  the  MBB  II  allele  is  assigned  to  lineage  2B, 
which  consists  of  a  MBBII  allele  only.  And  the  original 
lineage  2  is  now  designated  as  lineage  2A.  *  ^ 

4  Evolutionary  Lineages  of  Ab  Genes 

The  evolutionary  relationships  of  these  four  lineages  of 
Ab  genes  in  the  genus  Mus  is  exhibited  in  Figure  4-17.  In 
summary,  the  major  characteristic  distinction  among  four 
evolutionary  lineages  resides  in  intron  2  separating  the  A^^ 
and  A^2  exons.  Lineage  2A  allele  was  derived  from  a  lineage 
1  allele  by  an  861  bp  retroposon  insertion.  Subsequently, 
another  Bl  family  repeat  insertion,  composed  of  174  bp, 
occurred  at  intron  2  in  a  lineage  2A  allele,  thus  generating 
lineage  2B.  Eventually,  a  newly  arisen  family  repeat, 
consisting  of  539  bp,  integrates  into  a  lineage  2B  allele, 
thus  producing  lineage  3.  It  is  noteworthy  that  these  four 
distinct  lineages  can  be  identified  in  wild  mouse  populations. 
However,  all  lineages  except  lineage  2B  were  found  to  be 
present  in  laboratory  inbred  strains.  The  unusual  scarcity 
of  lineage  2B  alleles  is  illustrated  by  the  fact  that  MBB  II 
is  the  only  2B  allele  of  44  lineage  2  alleles  in  our 
collection. 
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CHAPTER  5 

EVOLUTION  OF  MHC  CLASS  II  GENE  POLYMORPHISMS 

RFLP  Analysis  of  Ab  Genes  Within  Genus  Mus 

One  of  the  goals  of  this  dissertation  is  to  find  out  the 
distribution  of  Ab  lineages  among  the  various  species  and 
subspecies  in  the  genus  Mus  and  to  determine  how  long  these 
Ab  lineages  have  persisted  in  the  genus  Mus .  Mouse  is  an 
excellent  system  in  which  to  measure  the  time  of  divergence 
as  the  phylogenetic  relationships  of  various  species  and 
subspecies  have  been  studied  extensively  by  various  techniques 
(She  et  al.  1990a) .  Previously,  McConnell  et  al  (1988)  have 
shown  that  Mhc  class  II  Ab  genes  can  be  grouped  into  three 
evolutionary  lineages  on  the  basis  of  retroposon 
polymorphisms.  However,  the  number  of  species  and  subspecies 
of  Mus  included  in  their  analysis  was  limited  in  scope.  The 
results  of  their  analyses  in  terms  of  lineage  distribution  of 
Ab  genes  in  various  species  and  subspecies  are  shown  in  Table 
2-1  and  Figure  2-8.  In  this  dissertation,  by  Southern  blot 
hybridization,  DNA  sequence  analysis  and  PCR  amplification, 
115  Ab  genes  have  been  analyzed  and  reorganized  into  four 
distinct  lineages.  Furthermore,  this  analysis  expands  the 
molecular  genetic  study  of  Ab  genes  to  12  separate  species 
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and  subspecies  in  the  genus  Mus.  The  mouse  strains  and  their 
geographic  origins  included  in  this  study  are  listed  in  Table 
3-1.  The   mouse    genomic    DNAs    were    digested   with  seven 

restriction  enzymes  (Eco  RI,  Bam  HI,  Hind  III,  Bgl  II,  Pst  I, 
Pvu  II,  SSt  I)  and  analyzed  by  Southern  blot  hybridization 
with  a  genomic  Ab''  probe.  The  orientation  of  restriction 
fragments  was  determined  by  stripping  and  hybridizing  with  5' 
and  3'  regions  of  Ab  probe,  respectively.  A  typical  mapping 
experiment  is  shown  in  Figure  4-15.  In  each  case,  the 
restriction  mapping  of  Ab  alleles  was  further  confirmed  by 
double  digestion  experiments.  With  regards  to  DNA  samples 
being  heterozygous  for  Ab  gene,  the  assignment  of  RFLP  pattern 
to  individual  allele  was  made  possible  by  comparing 
restriction  fragments  with  other  known  alleles.  The  RFLP 
patterns  of  individual  Ab  alleles  and  their  corresponding 
restriction  maps  are  shown  in  Table  5-1  and  Figure  5-1.  Close 
inspection  of  restriction  maps  of  these  Ab  alleles  indicate 
that  the  majority  of  these  restriction  site  polymorphisms  are 
due  to  insertion/deletion  and  point  mutations,  resulting  in 
the  creation  or  loss  of  restriction  sites.  It  is  evident  from 
restriction  analysis  that  there  is  no  correlation  between 
restriction  site  allele  and  the  distribution  of  species  or 
subspecies.  A  total  of  86  Ab  alleles  is  revealed  from  the 
analysis  of  115  H-2  haplotypes (Table  5-1  &  Figure  5-1).  Only 
unique  Ab  alleles  are  listed  in  Table  5-1.  Even  so,  similar 
or  closely-related  alleles  were  frequently  found  present  in 
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Table  5-1.     RFLP  Patterns  of  Ab  Alleles  From  12  Species  and 
Subspecies  of  Genus  Mus. 


Strain  Pst  I     EcoRI     BamHI     Pvu  II  Sac  I     Bgl  II  Hind  III 


MAI  4.80 

MBB-1  3.89 
MBB-2  4.80 

MBK  3.89 

MBS-1  3.89 
(same  as  MBK) 

MBS-2  4.80 

MBT  4.80 
MDL  I  4.80 
MDL  II  4.40 
MDS  4.80 
MPW  4.80 


>12.0  7.6 

2.12* 
2.0* 


>12.0  9.0* 
2.6* 


6.38  7.6§ 
2.12* 


5.4  9.0* 
2.6* 


5.4  9.0* 
2.6* 


6.38  7.6 

2.12* 


6.38  7.6 
2.6* 
2.06* 

6.38  7.6 

2.12* 
2.0* 

>12  8.4 
2.6* 
2  .  06* 

6.38  7.6 
2.6* 
2.12* 

6.38  7.6 
2.6* 
2.12* 


3.79*  5.2@ 
2.75@  3.5* 
2.1 
1.58 

2.89*  5.2@ 
2.75@  3.8* 
2.65 

4.83*  6.7§ 
2.75@  4.6* 
2.3 
1.8 

2.89*  7.8§ 
2.75e  3.8* 


2.89*  7.8@ 
2.75@  3.8* 


3.79*  5.2@ 
2.75@  3.8* 
2.1 
1.58 

3.59*  7.3@ 
2.65@  4.5* 
1.58 

3.79*  7.3§ 
2.75§  3.5* 
1.58 

4.6*  7.8§ 
2.75§  4.6* 
1.7 

4.83*  7.3§ 
2.75@  3.8* 


4.83*  5.2§ 
2.75@  3.8* 
2.1 
1.58 


9.0*  8.0§ 
3.62§  4.5* 


12. 2§  5.5§ 
1.8 


13. 6@  8.6@ 
7.2* 


12. 2§  6.2* 
5.5@ 
1.7 

12. 2@  6.2* 
5.5@ 
1.7 

12.2  8.0@ 
3.62@  6.6* 


9.2  8.0*§ 
3.62§ 


9.0  8.0 
3.62§  4.5* 


13. 0@  15.0* 


10.0  8.6* 

3.62§  8.0 
1.58 

12.2  8.0 

3.62@  6.4* 
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Table  5-1.  continued 

Strain  Pst  I     Eco  RI  Bam  HI  Pvu  II  Sst  I  Bgl  II  Hind  III 

MYL         4.80       6.38       TTs         3.79*  8.3§  oTs  sTo 

2.6*       3.69*  7.3@  9.0  7.8* 

*.:,-5  .    r  2.12*     2.75@  4.5*  3.62@  4.5* 


2.0*  3.5* 
1.58 


MOL         4.80       6.38       7.6         4.83*     7.3§  10.0  8.0 

2.6*  2.75§  3.8*  3.62@  6.4* 
2.12*  1.58 

CAS         3.89       5.4         9.0         3.17*     5.6^  11. 0@  9.0* 

2.5*       2.89@     3.7*  2.5 

2.3*  1.7 

SEI         4.80       6.38       9.7*@     3.79*     5.2@  12.6*  8.0@ 

2.0*       2.75@     4.1*  7.6* 
2.1 
1.58 

SEG  I     4.8         6.38       7.6@       3.91*     5.2  12.2*  8.0© 

2.2*       2.75§     3.5*  3.62  4.5* 

2.0*  2.1 
1.58 

SEG  II  4.8         6.38       5.4§       3.79*     4.8  7.6*  7.6§ 

(same  as  SPE)                 2.6*       2.75@     3.8*  3.5  7.3* 

2.06*  1.58 

SPE         4.8         6.38       5.4§       3.79*     4.8  9.3*  7.6@ 

2.6*  2.75@  3.8*  3.5  7.3* 
2.06*  1.58 

SET  I     4.8         6.38       7.6         3.79*     7.3@  12.2  6.3 

2.12*     2.75       3.8*  3.62  5.4 
2.0*  1.58 

SET  II  4.8         6.38       5.4         3.79*     5.26  13.0  4.5 

4.3*       2.75       3.8*  3.5 
2.6*  1.58 

SFM  I     3.89        (6.40)    9.0*@     2.89*     5.2@  12.2  5.5@ 

2.6*       2.75§     3.8*  1.7 
2.65 

SFM  II  4.8         6.38       7.6§       3.79*     5.2@  7.0  8.0@ 

2.12*     2.75@     3.5*  3.62  4.5 
2.0*  2.1 
1.58 
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Table  5-1.  continued 

Strain  Pst  I    Eco  RI  Bam  HI  Pvu  II  Sst  I    Bgl  II  Hind  III 


SMA  I     4.8         6.38       9.7*       3.79*     7.3§  9.0*  8.0* 

3.3*       2.75       4.1*  3.62  6.9* 
1.58  5.5 
3.7 

SMA  II  4.8         6.38       7.6         3.79*     5.2§  9.0* 

2.2*       2.75       3.4*  3.62 
2.06*  2.1 
1.58 

STF  I     3.89       5.4         9.0*@     2.89*     5.26  12.2*  6.2* 

2.6*       2.75@     3.8*  5.5@ 
2.65  1.7 

STF  II  4.8         6.38       7.6@       3.79*     7.3  9.0*  8.0@ 

2.12*  2.75§  3.5*  3.62  4.5* 
2.0*  1.58 

XBJ         3.89       5.40       8.7*       2.89*     7.8@  12.2  6.2 

>10.0     4.4         2.75       3.5*  9.0  5.5§ 
2.4* 
2.0* 

XBS         3.89       >10.0     9.0*       2.89*     7.8  9.0  6.2 

2.0*       2.75       3.5*  5.5@ 

1.7 

ZBNl       3.89       5.4         9.0*       2.89*     7.8@  9.30  6.2* 

2.6*       2.75       4.5*  5.5@ 

1.7 

ZRU  I     4.8         6.38       7.6         3.79*     7.3@  9.0  8.0 

3.1*       2.75@     2.9*  3.62  11 
2.12*  1.58 

ZRUII     4.8         6.38       7.6         3.79*     7.8  9.0  8.0 

3.1*  2.75@  4.2*  3.62  10.0 
2.12* 

ZYD  I     3.89       5.4         9.0         2.89*     7.8  9.2*  6.2* 

2.6*       2.75@     4.5*  5.5@ 

1.7 

ZYD  II  4.8         6.38       7.6         3.79*     7.3  8.4*  8.0§ 

2.2*       2.75@     3.6*  3.62 
2.0*  1.58 
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Table  5-1.  continued 


Strain  Pst  I     Eco  RI  Bam  HI  Pvu  II  Sst  I     Bgl  II  Hind  III 


ZYP  I 

3  .89 

5.4 

9.0*§ 
3.1* 

2.89* 
2.75@ 

7.8e 
3.5* 

9.2* 

6.2 

5.5§ 

1.7 

ZYP  II 

4.8 

6.38 

7.6@ 
2.2* 
2.0* 

3  .79* 
2.75@ 

7.3§ 
2.9* 
1.58 

9.0* 
3.62 

8.0@ 
9.4 

KAR  I 

3  .89 

7.2 

10.0* 
4.4 

2.89* 

1.85 

0.9§ 

4.1* 
3.8 

11.7 

6.2* 

5.5 

1.7 

KAR  II 

4.8 

7.2 

7.6 
2.3* 

4.2* 

7.3@ 
4.1* 
1.58 

11.7 

8.0 
4.5* 

COK 

3  .  89 

6.4 

5.4@ 
3.6* 
2.6* 

2.89* 
2.75§ 

5.2@ 
3.8* 
2.9 

12.2 

6.6 

5.5§ 

1.7 

CRV 

3.89 

9.8 

5.4 

3.6* 

2.6* 

2.89* 
2.75§ 

5.2@ 
4.6* 
2.65 

12.2 

7.0 
5.  6§ 
1.7 

CRP  I 

3.89 

6.4 

5.4@ 
3.6* 
2.5* 

2.89* 
2.75 

5.2e 
3.8* 
2.9 
2.65 

12.2 
9.0 

6.6 

5.5§ 

1.7 

CRPII 

3.89 

>10.0 

5.4@ 
3.6* 
2.5* 

2.89* 
2.75 

3.8* 
2.90 

(9.0) 

6.6 

5.5@ 

1.7 

PAH 

3.39 

>10.0 

9.0§ 
4.2* 

2.89§ 
2.75 

5.2@ 
3.8* 
2.2 

8.8* 
6.5 

10.8 

PTX 

4.14 
3.68 

>10.0 

5.4§ 
3.6* 
2.5* 

5.7*@ 

5.7@ 
3.65@ 
3.5* 
2  .  65 

12.2 
11.7 

6.7* 
5.9* 
1.7 

@  indicates  restriction  fragments  that  hybridize  to  5'  region 
of  Ab  probe. 


*  indicates  restriction  fragments  that  hybridize  to  3 '  region 
of  Ab  probe. 

  indicates  restriction  fragments  that  have  double  dosage. 
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Table  5-1.  continued 

Strain  Pst  1    Eco  Rl  Bam  HI  Pvu  II  Sst  1  Bgl  II  Hind  III 

B10.D2  3.89       574         97o         2.89  sTl  12 . 2  672 

2.6         1.85  3.8  2.5 

0.9  2.65  1.7 

BIO.F  3.89       5.4         5.4         2.89  5.2  11.7  10 

3.6         2.75  3.8  5.5 

2.6  2.65  1.7 

BIO.Q  3.89       5.4         5.4         2.89  5.2  11.7  10 

3.6         2.75  3.8  5.5 

2.6  2.65  1.7 

BIO.RIII       3.89       5.4         9.0         2.89  5.2  12.2  6.2 

2.6         2.75  3.8  5.5 
2.65  1.7 

BIO.SM  3.89       18  9.0         2.89  5.2  12.7  8.5 

2.6         2.75  3.8  5.5 
2.65  1.7 

B10.SAA48     3.89       chk         9.0         2.89  7.8  12.7  6.2 

2.6         2.75  3.8  5.5 

1.7 

B10.KEA5       3.89       5.4         5.4         2.89  5.2  11.7  10 

3.6         2.75  3.8  5.5 

2.6  2.65 

B10.CAA2       3.89       5.4         5.4         2.89  5.2  11.7  10 

3.6         2.75  3.8  5.5 

2.6  2.65 

B10.STC77     3.89       5.4         5.4         2.89  5.2  11.7  10 

3.6         2.75  3.8  5.5 

2.6  2.65 

B10.BUA16     3.89       5.4         9.0         2.89  5.2  11.7  6.2 

2.6         1.85  3.8  5.5 

0.9  2.65  1.7* 

METKOVICl     3.89       5.4         9.0         2.89  5.2  12.2  6.2 

2.6         2.75  3.8  5.5 
2.65  1.7 

METK0VIC2     3.89       5.4         5.4         2.89  5.2  11.7  8.5 

3.6         2.75  3.8  5.5 

2.6  2.65  1.7 
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Table  5-1.  continued 

Strain  Pst  I     Eco  RI  Bam  HI  Pvu  II  Sst  I     Bgl  II  Hind  III 


t"^  3.89       5.4         9.0         2.89       5.2  12.7  6.2 

2.6         2.75       3.8  5.5 
2.65  1.7 

t"^  3.89       18  5.4         2.89       5.2  12.2  10 

3.6  2.75  3.8  5.5 
2.6  2.65  1.7 

tw32  3.89       18  5.4         2.89       5.2  12.2  10 

3.6  2.75  3.8  5.5 
2.6  2.65  1.7 

BELGRADEl     3.89       5.4         9.0         2.89       7.8  12.2  6.2 

8.0         2.75       3.8  5.5 

1.7 

BRN02  3.89       5.4         n.d.        2.89       5.2  11.7  n.d. 

2.75  3.8 
2.65 

VIB0RG5         3.89       5.4         5.4         2.89       5.2  11.7  8.5 

3.6  2.75  3.8  5.5 
2.6  2.65  1.7 

VIB0RG8         3.89       5.4         5.4         2.89       5.2  11.7  8.5 

3.6  2.75  3.8  5.5 
2.6  2.65  1.7 

B10.CAS2       3.89       5.4         9.0         2.89       5.2  11.7  6.2 

2.6         2.75       3.8  5.5 
2.65  1.7 

THONBURIl     3.89       5.4         11  2.89       5.5  12.7  11 

2.6         2.75       3.8  2.5 
2.3  1.7 

TH0NBURI2     3.89       5.4         11.0       2.89       5.2  11.7  11 

2.6         2.75       3.8  2.5 
2.3  1.7 

PANCEVO-d     3.89       5.4         9.0         2.89       7.8  12.2  6.2 

2.0         2.75       3.5  5.5 

1.7 

BIO  4.80       6.38       7.6         3.79       7.3  9.0  8.0 

2.12  2.75  3.5  3.62  4.5 
2.0  1.58 
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Table  5-1.  continued 

Strain  Pst  I     Eco  RI  Bam  HI  Pvu  II  Sst  I     Bgl  II  Hind  III 


BIO.M 


4.80  6.38  7.6  4.83  7.3  10  8.0 
2.12  2.75  3.8  3.62  6.6 
2.0  1.58 


BIO.WB  4.80       6.38       7.6         4.83       7.3         11  8.0 

2.12  2.75  3.8  3.62  6.6 
2.0  1.58 


BIO.S 


4.80  17 


7.6 
2. 12 
2.0 


3.79 
2.75 


5.2 
3.8 
2.1 
1.58 


9.5 
3.62 


8.0 
7.4 


B10.STC90     4.80       6.38       9.7  3.79 

2.0  2.75 


7.3 
3.8 
1.58 


9.0 
3.62 


8.0 
4.5 


W12A 


STU 


AZROUl 


4.80       6.38  9.7 
2.0 


4.8         6.38  9.7 
2.0 


4.80       6.38  7.6 
2.12 
2.0 


FAIYUM3         4.80       6.38  7.6 

2.12 
2.0 

FAIYUM4         4.80       *6.38  9.7 

2.0 


FAIYUM5         4.80       *6.38  12.2 

2.12 


JERUSALEM3  4.80       chk  7.6 

2.12 
2.0 


3.79 
2.75 


3.79 
2.75 


3.79 
2.75 


3.79 
2.75 


3.79 
2.75 


3.79 
2.75 


3  .  79 
2.75 


5.2 
3.8 
2  . 1 
1.58 

5.2 
3.8 
2.1 
1.58 

7.3 
3.5 
1.58 

7.3 
3.8 
1.58 

5.2 
3.8 
2  . 1 
1.58 


12.6 


5, 
3, 
2, 
1, 

7, 
3. 
1. 


2 
8 
1 

58 

3 
5 

58 


12.6 


9.0 
3.62 


10.0 
3.62 


12.6 


12.6 


9.0 
3.62 


8.0 
7.4 


8.0 
7.4 


8.0 
4.5 


8.0 
6.6 


8.0 
7.0 


8.0 
7.0 


8.0 
4.5 
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Table  5-1.  continued 

Strain  Pst  I     Eco  RI  Bam  HI  Pvu  II  Sst  I     Bgl  II  Hind  III 


JERUSALEM4  4.80       6.38       7.6         4.83       7.3         11.0  8.0 

2.12  2.75  3.8  3.62  6.6 
2.0  1.58 

METK0VIC3     4.80       12.0       7.6         3.79       5.2         9.5  8.0 

2.12       2.75       3.8         3.62  7.4 
2.0  2.1 
1.58 

t"^2  4.80       6.38       9.7         4.83       7.3         13.6  8.0 

2.0         2.75       3.8  7.0 
1.58 

TT6  4.80       6.38       9.7         4.83       7.3         13.6  8.0 

2.0         2.75       3.8  6.6 
1.58 

BRNOl     ,         4.80       6.38       7.6         4.83       5.2         11.1  8.0 

2.12       2.75       3.8         3.62  7.4 
2.0  2.1 
1.58 

i"^'  4.80       6.38       9.7         4.83       7.3         13.6  8.0 

2.0         2.75       3.8  7.0 
1.58 

t"^  4.80       6.38       7.6         3.79       7.3         9.0  8.0 

2.12  2.75  3.5  3.62  4.5 
2.0  1.58 

f. 

CADIZl  4.80       6.38       9.7         3.79       5.2         9.5  8.0 

...  2.38       2.75       3.8         3.62  7.3 

W    .   .  2.1 

1.58 

PANCEVO-b     4.80       6.38       7.6         3.79       7.3         n.d.  8.0 

2.92       2.75  2.8 
1.58 


n.d.tdata  is  not  available. 
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different  species  and  subspecies.  For  example,  lineage  2A 
alleles,  C57BL/10  and  SEGl,  restriction  maps  of  which  resemble 
to  each  other,  are  found  in  M.  m.  domesticus  and  M.  spretus. 
respectively.  Likewise,  MET2  and  CRPl,  both  of  which  are 
lineage  1  alleles,  are  identified  to  in  M.  m.  domesticus  and 
M.  cervicolor,  indicating  that  Mhc  genes  evolve  in  a  trans- 
species  fashion. 

Lineage  Distribution  of  Ab  Alleles  Within  the  Genus  Mus 


As  the  Ab  genes  derived  from  different  species  and 
subspecies  of  genus  Mus  were  classified  into  evolutionary 
lineages  (i.e.  1,  2A,  2B  and  3),  the  distribution  patterns  of 
those  different  lineages  of  Ab  genes  in  Mus  were  determined. 
Figure  5-2  presents  a  phylogenetic  tree  which  was  built  on  the 
basis  of  evolutionary  relationships  of  these  separate  lineages 
of  Ab  genes  in  various  species  and  subspecies  of  Mus.  Several 
additional  features  of  trans-species  evolution  of  these  Ab 
lineages  are  revealed  from  this  analysis.  This  study  has 
expanded  the  analysis  of  M.  spretus.  M.  spretoides,  M. 
spicilequs  to  include  a  total  of  20  H-2  haplotypes.  Ab 
alleles  from  lineages  1  and  2A  were  found  in  all  three  of 
these  aboriginal  mouse  species.  Ab  alleles  from  both  lineages 
are  present  in  M.  caroli.  indicating  that  alleles  in  these  two 
lineages  diverged  at  least  2.5  million  year  ago.  The 
emergence  of  lineage  2B  and  3  must  be  very  recent  events  as 
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both  are  found  only  in  subspecies  of  M.  musculus  complex, which 
are  estimated  to  diverge  at  least  0.4  million  years  ago.  It 
is  worth  noting  that  although  lineage  3  alleles  are  found  in 
both  M.  m.  musculus  and  M.  m.  domesticus.  lineage  2B  allele 
is  only  found  in  M.  m.  musculus.  However,  as  shown  before, 
lineage  3  alleles  are  derived  from  lineage  2B  allele.  The 
failure  to  identify  lineage  2B  allele  in  M.  m.  domesticus  may 
indicate  that  it  has  been  lost  from  the  natural  populations, 
or  may  be  due  to  the  low  number  of  sampled  alleles  in  our 
collection. 

On  the  basis  of  distribution  pattern  of  individual 
lineage  of  Ab  gene,  it  was  concluded  that  the  lineage  1,  2A, 
23  and  3  alleles  had  persisted  through  at  least  five,  three, 
and  one  speciation  events,  respectively,  during  the  course  of 
Ab  gene  evolution. 

Phyloqenetic  Relationships  of  86  Ab  Genes  in  the  Genus  Mus 

Restriction  mapping  and  DNA  sequencing  enabled  us  to 
determine  not  only  the  quantity  of  DNA  sequence  variation  but 
also  the  nature  of  this  variation.  Phylogenetic  analysis, 
based  on  the  restriction  map  and  sequence  data,  can  provide 
a  huge  amount  of  information  concerning  the  origins  of 
different  sequence  types. 

To  investigate  the  phylogenetic  relationships  among  Ab 
genes  of  genus  Mus,  we  analyzed  the  restriction  map  data  by 
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the  parsimony  method.  A  16-kilobase  region  around  the  Ab  gene 
in  each  allele  was  examined  with  seven  restriction 
endonucleases  (Bam  HI,  Eco  RI,  Hind  III,  Pvu  II,  Pst  I,  Bgl 
II,  Sst  I  ).  As  expected,  the  gain  and  loss  of  restriction 
sites  accounts  for  most  of  the  polymorphism  observed  (Nei 
1987)  .  In  addition,  several  major  insertions,  which  have  been 
used  to  delineate  the  evolutionary  lineages,  were  also 
detected.  A  total  of  86  alleles  was  identified  from  115  H-2 
haplotypes  on  the  basis  of  restriction  site  polymorphisms  and 
3  different  sizes  of  retroposon  insertions.  Using  restriction 
site  polymorphism  as  a  character  state,  it  became  possible  to 
reconstruct  the  phylogenetic  relationships  from  restriction 
map  data. 

The    phylogenetic    trees    can    be    constructed    in  many 
different  ways,   often  with  slightly  different  results. 
(Felsenstein  1982)  .    The  method  used  in  this  analysis  is  named 
"mixed  parsimony",  supplied  by  Felsenstein • s  PHYLIP  package. 
This  algorithm  does  not  produce  a  rooted  tree. 

Among  41  polymorphic  restriction  sites  recognized  by  7 
restriction  enzymes,  29  were  informative  for  phylogeny 
analysis (that  is,  polymorphic  restriction  sites  were  present 
in  at  least  two  alleles  each.).  A  typical  restriction  site 
allele  exemplified  by  B10.D2  is  shown  in  Figure  5-3.  The 
full  restriction  site  character  set  of  86  Ab  alleles  is  shown 
in  Table  5-2.  Each  allele  is  composed  of  restriction  map 
variants  shown  in  5 •  to  3 '  with  respect  to  order  of  each 
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Table  5-2 .  Restriction  Site  Character  Set  of  Ab  Alleles 


10  20  29 

B10.D2  00001011001000101110100111110 

BIO.F  01001011001000100110000110110 

BIO.RIII  00001011001000100110100110110 

BIO.SM  00001011001000100110000110100 

B10.SAA48  00001010001000100110100110100 

BIO.BUA  00001011001000100110100111110 

MET-2  01001011001000100110000110110 

TW5  00001011001000100110100110110 

TW8  01001011001000100110000110100 

BELl  00000010001000100110100110110  10 

B10.CAS2  00001011001000100110100110110 

THONl  00001010001000111110000110110 

PAN  D  0001?010001001?00110100110110 

BIK/g  00000011001000101110100111110 

38CH  00000011001000000110000110010 

DMA  00001001001000101110100111110 

BEP-1  000010110011???10110000110110 

DSD-1  01001011001000100110100110110 

MBB-1  00001011001000100110000110010 

MBK  00001010001000100110100110110  20 

CAS  00001010001000111110000110110 

SFM-1  00001011001000100110000110??0 

STF-1  00001011001000100110100110010 

XBJ-1  0001?010001001?00110100111110 

XBJ-2  00001010001001?00110100111??0 

XBS  0001?010001001?001101001100?0 

ZBN-1  00001010000000000110100110110 

KAR-1  00000000000000001110100111??0 

COK  01001011001000100110000110010 

CRV  01001011001000000110000110000  30 

CRP-1  01001011001000100110000110010 

CRP-2  010010110000000001 10000110??0 

ZYD-1  00001010001000000110100110110 

ZYP-1  00000110001001700110100110110 

K  00101010010000000000000110011 

U           \,  10101011010000000010000110011 

MDL-2         '  00101010010000000000000110011 

DBV-2  00100001010000000010000011771 

C57BL/10  00110010011001000011710110110 

BIO.M  00110010011000100010010100110  40 

C3H. JK  00110010011000100010010100110 

BIO.S  00110010111000100010010110100 

B10.STC90  00010010011000100011710110110 

W12A  00010010111000100010000110110 

FAI-3  00110010011000100010010110110 

FAI-4  00010010111000100010000110110 

MET-3  00110010111000100010010110100 

twl2  00010010011000100010000100110 
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Table  5-2.  continued 


10  20  29 

TT6  00010010011000100010000100110 

BRNO-1  00110010111000100010010100110  50 

CADIZ-1  00000010111000100010010110110 

PAN-B  0010011001101??00010010110110 

BFM  00010010011000100010000110110 

BNC  00101010011000000010010110110 

DBP  001010100111???00010011110110 

DGD  00101010011000100010000100110 

DOT  00100010011000000010010110110 

BIB-2  00100010011000100010000100110 

BEP-2  00100010111000100000011110100 

DJO-2  00100010111000100010010100110  60 

DSD-2  00100010011000100010000100110 

MAI  00110010111000100011710110010 

MBB-2  00100000010000000010000100110 

MBS-2  00100010111000100010010100110 

MBT  00101010011000000010010110110 

MDS  00111010011000100010010100110 

MPW  00101010111000100010010100110 

MYL-1  00110010011001000011710110110 

MYL-2  00101000011000000010010110110 

MOL  00101010011000100010010100110  70  '  ■ 

SEI  00010010111000000010010110110 

SEG-1  00110010111001000011010110110 

SPE  10101070011000100010010110110 

SET-1  00110010011000100011710110110 

SFM-2  00110010111001000011010110110 

SMA-1  00000010011000000010010110110 

SMA-2  00110010111001000010010110110 

ZRU-1  00100110011017700010010110110 

ZRU-2  00100100011000000010010110110 

ZYD-2  00110010011001000010010110110  80 

Zyp-2  00110010011017700010010110110 

KAR-2  00100010011000000011700010010 


Character  set  derived  from  Figure  5-1  and  Table  5-1  .  The 
numbers  on  the  top  of  column  indicate  the  character  number 
described  in  Table    5-2,   1:   indicates  the  presence  of  the 
specified  restriction  site,   0:   indicates  the  absence  of 
restriction  site,   7:   indicates  the  restriction  site  is 
undetermined. 
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restriction  enzymes.  "+"  and  "-"  indicate  the  presence,  and 
absence,  respectively,  of  a  given  restriction  site.  The 
character  state  of  Ab  allele  is  explained  in  Table  5-3 .  As 
the  computer  program  supplied  by  Felsenstein' s  package  has  a 
limited  capacity  to  analyze  all  the  alleles  at  a  time,  each 
lineage  of  alleles  were  analyzed  first  (data  not  shown)  to 
find  out  the  phylogenetic  relationship  of  closely  related 
alleles.  Then,  the  different  lineage  alleles  were  pooled  and 
analyzed  altogether.  The  parsimonious  network  of  the  86  Ab 
alleles  constructed  is  shown  in  Figure  5-4.  This  phylogenetic 
tree  requires  96  mutational  steps.  The  bar(s)  between  the 
alleles  indicate  the  character  state  change.  Branch  lengths 
are  proportional  to  the  number  of  character  changes.  The 
distance  between  the  different  alleles  is  proportional  to 
their  DNA  sequence  divergence,  which  is  reflected  by  the 
numbers  of  character  change  between  them.  Those  alleles  that 
are  encircled  by  solid  lines  are  different  alleles  which  are 
shown  to  be  phylogenetically  identical  by  parsimony  analysis. 
The  Mhc  class  II  Ab  genes  have  been  divided  into  four 
evolutionary  lineages  based  on  retroposon  polymorphisms.  A 
remarkable  feature  about  this  Ab  phylogenetic  tree  is  that  its 
main  branches  correspond  very  closely  to  the  evolutionary 
lineages  defined  before.  It  is  evident  from  this  phylogenetic 
tree  that  lineage  3  alleles  are  evolutionarily  more  closely 
related  to  lineage  2  than  to  lineage  1  (Figure  5-4)  .  In  light 
of  the  fact  that  each  Ab  lineage  is  derived  from  other  lineage 
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Table  5-3.  Coding  of  Restriction  Site  Data  of  Ab  Alleles. 


Character 

Feature 

Character 

Feature 

number 

number 

1 

Bam 

HI-5.4 

16 

Sst  1-2.3 

2 

Bam 

HI-3.6 

17 

Hind  III-2.5 

3 

Bam 

HI-2. 1 

18 

Hind  III-2.5 

4 

Bam 

HI-2.0 

19 

Hind  III-1.7 

5 

Bam 

HI-2 . 6 

20 

Hind  III-4.5 

6 

Bam 

HI-3.1 

21 

Hind  III-5.2 

7 

Sst 

1-7.8 

22 

Bgl  II-3.62 

8 

Sst 

1-2.6 

23 

Bgl  II-5.1 

9 

Sst 

1-2.1 

24 

Pvu  II-2.75 

10 

Sst 

1-2.1 

25 

Pvu  II-3.75 

11 

Sst 

1-1.65 

26 

Pvu  II-0.9 

12 

Sst 

1-2.2 

27 

Eco  RI-5.4 

13 

Sst 

1-2.8 

28 

Eco  RI-5.4 

14 

Sst 

1-3.5 

29 

Pst  1-1.2 

15 

Sst 

1-3.8 

The  data  are  derived  from  restriction  map  of  Figure  5-1. 
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by  retroposon  insertion,  it  is  obvious  that  the  divergence  of 
alleles  within  each  lineage  occurs  by  the  accumulation  of 
mutational  events,  mainly  due  to  base  substitution. 

The  phylogenetic  tree  shown  in  Figure  5-5  suggests  that 
each  lineage  contains  a  few  meaningful  sublineages  (designated 
by  circle  broken  lines) .  These  sublineages  each  contain  a 
cluster  of  closely  related  alleles.  In  some  sublineages,  the 
cluster  of  alleles  are  derived  from  different  Mus  species,  for 
example,  MYLl,  C57BL/10,  SEGl,  SFMl,  suggesting  the  trans- 
species  mode  of  evolution  operating  on  the  Ab  gene. 
Occasionally,  clusters  of  alleles  are  derived  from  the  same 
subspecies,  e.g.,  CAS  and  THONl,  both  of  which  belong  to  M. 
m.  castaneus. 
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CHAPTER  6 
DISCUSSION 

Function  of  Mhc  Genes 

The  function  of  Mhc  molecules  is  to  present  antigen  to 
T  cell  receptors  on  thymus-derived  lymphocytes  (reviewed  by 
Klein  1986)  .  T  cell  responses  to  antigen  have  a  dual 
specificity-one  for  the  protein  antigen  itself  and  another 
dictated  by  the  allelic  form  of  the  Mhc  molecules  (reviewed 
by  Schwartz  1985) .  The  molecular  basis  of  this  "Mhc- 
restricted  recognition"  is  explained  by  the  remarkable  finding 
that  Mhc  molecules  are  actually  peptide  carriers  or  receptors. 
The  physical  complex  of  peptide  fragments  and  Mhc  molecules 
is  what  interacts  with  T  cell  (Buus  et  al.  1987;  Allen  et  al . 
1987) .  X-ray  crystallographic  studies  of  three-dimensional 
structure  of  Mhc  class  I  revealed  a  putative  peptide-binding 
groove  lined  with  the  most  polymorphic  residues  of  Mhc 
polypeptides  (Bjorkman  et  al.  1987a,  1987b) .  These 
observations  suggest  that  the  majority  of  class  II  gene 
polymorphisms  may  dictate  the  binding  specificity  of  class  II 
molecules  for  antigenic  peptides. 
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Features  of  Mhc  polyinorphisin 

The  unusual  genetic  features  of  Mhc  genes  suggest  that 
novel  evolutionary  mechanisms  must  operate  on  these  genes. 
Analysis  of  the  unprecedented  genetic  diversity  of  Mhc  loci 
has  indicated  four  important  properties  of  Mhc  genes  (reviewed 
by  Potts  &  Wakeland  1990) .  First,  selective  neutrality  is 
inconsistent  with  the  observations  made  from  population  data, 
suggesting  some  forms  of  balancing  selection  is  operating  on 
Mhc  loci.  Second,  the  population  analysis  indicate  that 
selection  is  operating  in  contemporary  populations  and  is  not 
episodic  with  long  intervening  periods  of  neutrality.  Third, 
diversifying  selection  is  operating  directly  on  the  ABS. 
Fourth,  selection  my  be  strong  enough,  at  least  for  species 
like  Mus,  to  measure  directly  in  population  studies.  As  many 
of  the  polymorphic  amino  acid  residues  of  class  II  molecules 
occur  within  the  ABS,  these  allelic  molecules  may  have 
different  binding  properties.  Subsequently,  these  variations 
may  alter  the  immune  response  of  individuals  to  foreign 
antigen.  Although  a  wealth  of  information  regarding  the 
functional  and  structural  properties  is  currently  available, 
little  is  known  about  the  significance  of  Mhc  polymorphism. 
The  selective  forces  involved  remain  elusive  (Klitz  et  al. 
1986;  Potts  et  al.  1988). 
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Mechanism  of  Generation  of  Ab  Gene  Polymorph isms 

Mutational  changes  in  DNA  can  be  classified  as  four 
types:  substitution,  deletion,  insertion  and  inversion.  RFLP 
analysis  is  able  to  detect  all  four  types  of  DNA  changes, 
although  it  is  most  efficient  in  detecting  deletions, 
insertions  and  inversions.  Substitutions  are  detectable  only 
when  point  mutations  occur  which  alter  the  recognition 
sequences  of  restriction  enzymes.  Therefore,  RFLP  analysis 
tends  to  underestimate  the  degree  of  substitution  in 
comparison  with  insertion,  deletion  and  inversion. 

For  the  seven  restriction  enzymes  used  in  our  analysis, 
the  segment  of  genomic  DNA  assayed  by  the  Ab  gene  probe 
spanned  about  16  kb.  Therefore,  the  polymorphic  restriction 
sites  revealed  in  this  study  are  distributed  over  a  fairly 
large  segment  of  DNA.  As  the  Ab  gene  is  encoded  by  700  bps 
of  exonic  DNA,  the  majority  of  DNA  examined  by  RFLP  analysis 
is  the  noncoding  regions  of  DNA  such  as  introns  and  flanking 
regions.  Thus,  the  restriction  site  polymorphisms  detected 
reflect  DNA  sequence  variations  in  the  non-coding  regions. 
Inspection  of  restriction  maps  of  86  Ab  alleles  in  our 
analysis  indicated  that  in  addition  to  three  distinct 
insertion  events  which  constitute  the  basis  of  evolutionary 
lineages,  most  restriction  site  polymorphisms  are  caused  by 
point  mutations. 
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Mhc  Genes  Evolve  via  Trans-species  Mode 


If  Mhc  polymorphism  arose  exclusively  after  the 
initiation  of  speciation,  then  one  would  expect  Mhc  alleles 
in  a  given  species  to  be  more  closely  related  to  each  other 
than  they  are  to  those  in  other  species.  However,  if  the  Mhc 
evolves  in  a  trans-specific  manner,  some  Mhc  alleles  from  one 
species  would  be  expected  to  resemble  those  from  other  species 
more  closely  than  they  do  to  each  other. 

A  number  of  studies  exploring  the  genetic  diversity  of 
Mhc  class  I  and  II  genes  indicate  that  a  considerable 
proportion  of  the  polymorphisms  of  contemporary  alleles 
predated  speciation  events,  i.e.  the  Mhc  genes  evolve  in  a 
trans-species  manner,  and  during  the  course  of  gene  evolution, 
they  diverge  by  slowly  accumulating  point  mutations  (McConnell 
et  al.  1988;  Figueroa  et  al.  1988;  Lawlor  et  al.  1988;  Mayer 
et  al.  1988)  .  Previously,  McConnell  et  al  (1988)  demonstrated 
that  alleles  of  Mhc  class  II  Ab  gene  can  be  organized  into  3 
evolutionary  lineages  based  on  their  genomic  structures.  The 
evolutionary  relationship  between  lineages  1  and  2  is  that 
lineage  2  alleles  are  produced  from  lineage  1  alleles  by  an 
861  bp  retroposon  insertion  in  the  intron  separating  A^^  and 
A^2    exons.  The    evolutionary    relationships    among  these 

lineages  of  alleles  were  first  elucidated  by  determining  the 
DNA  sequence  of  intron  2  from  a  lineage  3  allele  (k 
haplotype) .     This  sequence  analysis  has  provided  some  unique 
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insight  into  the  mechanism  (s)  of  generating  Ab  gene 
polymorphism.  On  the  basis  of  sequence  data,  PGR  enzymatic 
amplification  and  restriction  analysis,  the  Ab  genes  were 
reorganized  into  four  evolutionary  lineages,  1,  2A,  2B  and  3. 
The  result  of  this  analysis  clearly  indicated  that  four  Ab 
lineage  alleles  were  derived  from  three  independent  successive 
retroposon  insertions  in  the  intron  between  A^^  and  A^j  exons. 
Lineage  2B  allele  was  generated  from  lineage  2A  allele  by  an 
insertion  of  Bl  family  repeat.  Subsequently,  another  new 
family  of  retroposon,  composed  of  539  bp  of  nucleotides, 
integrated  into  a  lineage  2B  allele,  thus  generating  lineage 
3  allele.  Lineage  1  alleles  are  present  in  all  species  and 
subspecies  of  genus  Mus  examined  so  far,  suggesting  probably 
it  is  the  most  ancient  lineage  of  Ab  genes.  Lineage  2A 
alleles  are  identified  in  one  Asian  species,  Mus  carol i.  three 
aboriginal  species,  Mus  spicilegus.  Mus  spretoides.  Mus 
spretus  as  well  as  Mus  m.  musculus  and  Mus  m.  domesticus. 
Lineage  2B  alleles  are  only  found  in  M.  m.  musculus  to  date. 
However,  lineage  3  alleles  are  present  in  M.  m.  musculus  and 
M.  m.  domesticus.  Simultaneously,  during  the  course  of  Ab 
gene  evolution,  the  progenitor  alleles  thus  generated  from 
retroposon  insertion  accumulate  mutational  changes,  leading 
to  the  formation  of  cluster(s)  of  alleles  closely  related  to 
each  other. 
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Possible  Impact  of  Retroposons  on  Ab  Gene  Expression 


Since  retroposons  are  dispersed  through  the  host  DNA  by 
duplicative  retroposition,  it  is  likely  that  they  have  a  major 
impact  on  genomes.  The  most  obvious  is  their  mutagenic 
potential  due  to  the  disruption  of  sequences  at  the  site  of 
integration  (Chao  et  al.  1983) .  Retroposon  integrations  in 
exons  and  other  regulatory  regions  would  result  in  null 
alleles  and  might  be  selected  against  even  in  heterozygous 
states.  However,  retroposon  insertions  in  introns  and 
intergenic  regions  are  more  likely  to  be  neutral  (reviewed  by 
Deininger  1990) .  In  addition,  there  are  several  examples  of 
SINE  elements  found  in  noncoding  and  coding  regions  of 
numerous  genes  without  deleterious  effects.  The  insertions 
of  SINE  elements  have  been  used  as  a  signal  for 
polyadenylation,  portion  of  coding  sequence,  and  termination 
signal.  In  addition,  SINEs  have  been  implicated  in 
recombination  (Lehrman  et  al.  1987)  ,  act  as  limits  to  gene 
conversions  (Hess  et  al.  1983)  and  mobilize  unrelated  DNA 
sequences  throughout  the  genome  either  via  retroposition  of 
sequences  adjacent  to  SINEs  (Zelnick  et  al.  1987)  or  by 
facilitating  recombination.  The  SINE  elements  and  repetitive 
family  member  identified  in  various  Ab  lineages  are  all 
positioned  in  intron  2.  Presumably,  these  retroposons  may 
not  have  any  drastic  impact  on  Ab  gene  function  as  this  intron 
has  5'   splice  site  with  GT  dinucleotide  and  3'   splice  site 
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with  AG  dinucleotides.  Subsequently,  these  inserted  sequences 
would  be  removed  from  primary  transcripts  by  RNA  processing. 
However,  in  studies  of  Ab  gene  expression  using  DNAase  I 
hypersensitivity  (DH)  assay,  it  has  been  shown  that  DH  sites 
unique  to  transcriptionally  active  tissues  were  mapped  into 
SINE  elements  (Mclndoe  et  al.  1990) .  These  data  suggest  that 
the  retroposon  insertions  in  the  gene  may  have  a  subtle 
unrecognized  effect  on  the  expression  of  Ab  genes.  The 
influence  of  these  retroposon  insertions  on  Ab  gene  expression 
may  be  significant  in  light  of  the  fact  that  numerous  studies 
have  shown  the  level  of  la  antigen  expression  is  critical  to 
the  efficiency  of  antigen-presentation  to  T  cells  (Matis  et 
al.  1983;  Janeway  et  al.  1984).  Moreover,  the  exceptionally 
high  abundance  of  SINEs  in  the  intron  may  reflect  a  more  open 
chromatin  structure  associated  with  such  genes  in  the  germ 
line  (Slagel  et  al.  1987) 

Linkage  Disequilibria  Among  Restriction  Sites 

Among  the  115  H-2  haplotypes  studied  in  this 
dissertation,  a  total  of  86  Ab  alleles  was  identified  by  RFLP 
analysis.  Close  inspection  of  their  restriction  maps  revealed 
one  unusual  feature  of  restriction-site  polymorphism,  that  is, 
there  is  strong  nonrandom  association  of  polymorphic 
restriction     sites     among     themselves.  This  nonrandom 

association  or  linkage  disequilibrium  occurs  mainly  because 
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association  or  linkage  disequilibrium  occurs  mainly  because 
restriction  sites  and  their  neighboring  genetic  loci  are 
tightly  linked  (Nei  1987b) .  During  the  construction  of 
phylogenetic  tree  of  Ab  genes,  29  informative  sites  were 
uncovered  and  used  for  parsimony  analysis.  Therefore,  one 
would  predict  that  2^^  >  52  x  10^  different  alleles  will  be 
generated  if  random  combination  of  restriction  sites  occurs. 
However,  our  global  sampling  of  mouse  H-2  haplotypes  came  up 
-  with  the  number  much  lower  than  that.  Clearly,  this  is  a 
strong   case    of    linkage   disequilibrium.  The    surveys  of 

distribution  and  frequencies  of  Mhc  class  I  and  class  II  genes 
in  natural  populations  of  mouse  indicate  that  H-2  polymorphism 
is  not  as  extensive  as  would  be  predicted  if  the  diversity  of 
these  gene  is  unlimited  (Wakeland  &  Nadeau  1980)  .  Studies  of 
H-2  and  allozyme  polymorphisms  with  respect  to  geographical 
and  temporal  distribution  in  wild  mice  have  indicated  that  Rz 
2  polymorphisms  were  more  uniformly  distributed  than  allozymes 
(Nadeau  et  al.  1988) .  Taken  together,  these  data  indicate 
that  some  alleles  are  selectively  maintained  in  many 
populations  as  suggested  previously  (Wakeland  &  Nadeau  1980) . 
If  the  only  selective  pressure  operating  on  H-2  genes  is 
random  diversification,  then  natural  populations  should 
contain  a  virtually  unlimited  number  of  H-2  alleles.  However, 
the  analysis  of  class  I  and  class  II  genes  suggests  that  they 
are  present  at  appreciable  frequencies  in  different  natural 
populations  of  mice  and  are  more  uniformly  distributed  than 
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indicated  that  similar  or  identical  H-2  genes  can  be 
identified  in  both  laboratory  inbred  strains  and  wild  mice. 
These  observations  can  be  interpreted  as  evidence  that 
selective  pressures  are  operating  to  restrict  the  polymorphism 
of  H-2  genes  (Wakeland  &  Nadeau  1980) . 

Maintenance  of  Mhc  Polymorphism 

Although  there  are  numerous  mechanisms  that  could 
contribute  to  the  maintenance  of  polymorphism,  only  a  few  are 
likely  to  apply  to  the  Mhc.  These  are  overdominance 
selection,  high  mutation  rates,  neutrality,  frequency 
dependent  selection,  variation  in  pathogen  assemblages  across 
space  and  time,  mating  preferences,  and  transmission 
distortion  favoring  Mhc  heterozygotes.  The  mutation  rate  at 
the  Mhc  loci  is  not  particularly  high  as  shown  by  Hayashida 
&  Miyata  (1983).  The  allelic  frequencies  of  HLA  are  too 
regular  to  be  compatible  with  neutrality  expectations  (Hedrick 
&  Thomson  1983) ,  and  neutrality  is  too  weak  a  force  to  account 
for  the  degree  of  H-2  polymorphisms  in  local  population  of 
Mus   (Potts  et  al.   1987) . 

Frequency  dependent  selection  favoring  rare  alleles  is 
a  more  potent  mechanism  to  maintain  polymorphism  than 
heterozygote  advantage  (Herick  1972) .  It  is  theoretically 
appealing  because  the  rare  Mhc  alleles  might  enjoy  an 
advantage  in  the  molecular  arms  race  against  pathogens  (Bodmer 
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1972)  .  However,  it  is  difficult  to  demonstrate  frequency 
dependent  selection  caused  by  pathogen  evolution  as  it 
requires  long  term  studies  to  observe  cycles  due  to  pathogen 
evolution. 

If  pathogen  assemblages  vary  in  space  and  time,  and 
specific  Mhc  alleles  are  more  effective  against  one  subset  of 
pathogens  than  others,  then  natural  selection  would  favor 
different  subsets  of  Mhc  alleles  according  to  the  current 
pathogen  assemblages.  This  type  of  selection  would  contribute 
to  the  maintenance  of  Mhc  polymorphism  because  different 
alleles  would  be  maintained  in  different  populations. 
Unfortunately,  the  data  available  concerning  pathogens  are 
not  sufficient  to  test  this  hypothesis. 

Disassortative  mating  according  to  Mhc  genotypes  would 
contribute  to  the  maintenance  of  polymorphisms.  This 
mechanism  involves  olfaction  and  the  genes  responsible  have 
been  mapped  to  Mhc  loci  (Yamazaki  1976;  Boyce  et  al.  1983). 

Transmission  distortion,  proposed  by  Clarke  and  Kirby 
(1966) ,  also  favors  the  production  of  Mhc  heterozygotes.  On 
the  surface,  both  mating  preferences  and  transmission 
distortion  resemble  heterozygote  advantage  in  that  they  result 
in  an  excess  proportion  of  heterozygotes.  However,  they  are 
more  effective  at  maintaining  polymorphism  because  rare 
alleles  have  an  advantage  in  all  genotypes,  whereas  under 
heterozygote  advantage,  rare  alleles  enjoy  an  advantage  only 
in  the  heterozygote  condition  (reviewed  by  Potts  et  al.  1988) . 
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Overdominant  Selection  for  Mhc  Polymorph ism 

The  extraordinary  polymorphism  of  Mhc  genes  set  them 
apart  from  all  other  known  genetic  loci.  It  is  generally 
believed  that  the  Mhc  loci  have  been  molded  by  special  forces 
not  acting,  or  at  least  not  to  the  same  degree,  on  other  loci 
(Klein  &  Figueroa  1986;  Klein  et  al.  1989).  On  one  hand, 
there  is  no  doubt  that  Mhc  loci  are  subject  to  negative 
purifying  selection  which  eradicates  functionally  unfit 
variants  as  can  be  judged  from  the  fact  that  the  diversity  of 
these  genes  is  not  unlimited.  But  this  type  of  selection 
probably  also  acts  on  most  other  functional  loci.  On  the 
other  hand,  one  wonders  whether  Mhc  loci  are  also  subject  to 
positive  selection  which,  for  example,  provides  an  advantage 
to  individuals  heterozygous  at  Mhc  genes?  Although  some 
observations  indicate  that  Mhc  loci  of  certain  species  are 
not  polymorphic  or  at  least  not  highly  polymorphic  (Figueroa 
et  al.  1986;  Watkins  et  al.  1988),  there  is  some  evidence 
suggesting  that  positive  selection  are  operating  to  drive  the 
diversification  of  Mhc  genes.  Hughes  and  Nei's  (1988,  1989) 
analysis  of  the  pattern  of  nucleotide  substitution  at 
synonymous  and  nonsynonymous  positions  in  the  codons  of  ABS 
provided  one  of  the  most  convincing  argument  for  positive 
Darwinian  selection.  Their  approach  was  to  compare  the 
nucleotides  constituting  the  ABS  with  those  coding  the  rest 
of  the  genes.  The  role  of  positive  selection  implicated  in 
enhancing  the  diversity  is  indicated  by  the  fact  that  the  rate 
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enhancing  the  diversity  is  indicated  by  the  fact  that  the  rate 
of  nonsynonymous  substitutions  in  the  ABS  is  higher  than  would 
be  expected  if  the  substitutions  are  neutral.  In  the  rest  of 
molecule,  nonsynonymous  substitutions  are  lower  than  expected, 
indicating  the  negative  selection  is  acting  on  the 
corresponding  portions  of  the  genes.  Positive  selection  may 
act  via  heterozygous  advantage  (overdominant  selection)  in 
which  the  superior  ability  of  Mhc  heterozygotes  to  bind  and 
present  antigen  will  enhance  their  resistance  to  infectious 
diseases  ,thus  increasing  their  relative  fitness  in  the 
population.  Overdominant  selection  is  also  known  to  enhance 
the  rate  of  amino  acid  substitution  and  increase  the 
heterozygosity  and  persistence  of  polymorphic  alleles 
enormously  compared  with  those  of  neutral  alleles  (Maruyama 
&  Nei  1981;  Nei  1987b).  The  conservation  of  evolutionary 
lineages  over  long  periods  can  also  be  explained  by  assuming 
that  positive  selection  has  been  acting  on  the  functional  Mhc 
genes  through  overdominant  selection. 

Divergent  Allele  Advantage 

Although  overdominant  selection  alone  may  explain  the 
number  of  Mhc  alleles  prevalent  in  natural  populations  and  the 
retention  of  ancestral  polymorphisms,  the  extensive  sequence 
diversity  between  alleles  in  A^^  exons  indicates  that  another 
selective  mechanism  specifically  enhancing  diversification 
must  also  be  operative  (Wakeland  et  al.  1990a) .    This  type  of 
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the  other  two  forms  of  balancing  selection  (i.e.  overdominant 
and  rare  allele  advantage)  commonly  thought  to  operate  on  Mhc 
genes  (Bodmer  1972;  Zinkernagel  &  Doherty  1974).  All  three 
types  of  selection  would  contribute  to  the  maintenance  of  Mhc 
polymorphism  of  highly  divergent  alleles  within  population 
(Wakeland  et  al.  1990b) 

Alu-like  Repetitive  Elements  in  Genes 

SINE  as  Evolutionary  and  Genetic  Tags 

Interspersed  repetitive  DNA  sequences  have  been 
discovered  in  the  genomes  of  all  vertebrate  species  studied 
to  date  (Schmid  &  Jelinek  1982;  Jelinek  &  Schmid  1982).  Many 
of  these  repetitive  DNA  families  are  present  in  extremely  high 
copy  numbers.  On  the  average,  Alu  elements  appear  every  5  kb, 
so  it  is  not  surprising  that  the  intron  between  A^^  and  h^^ 
exons  of  Ab  genes  contains  three  different  sizes  of  retroposon 
inserts  in  various  lineage  alleles.  Moreover,  these  three 
retroposon  insertions  were  produced  from  three  successive 
independent  insertional  events  resulting  in  the  formation  of 
four  evolutionary  lineages.  Alu  elements  in  specific 
locations  have  been  used  as  markers  to  study  gene  and  genome 
evolution  (Barsh  et  al.  1983;  Ruffner  et  al.  1987). 
Previously,  McConnell  et  al.  (1988)  proposed  that  SINE 
retroposons  can  be  used  as  evolutionary  tags  for  Mhc  class  II 
genes.    In  this  dissertation,  two  additional  SINE  retroposons 
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were  identified  and  used  to  further  dissect  the  evolutionary 
course  of  Ab  genes.  On  the  basis  of  divergence  time  estimated 
from  studies  of  different  species  and  subspecies  of  Mus,  these 
evolutionary  tags  can  be  used  as  a  molecular  clock  to  estimate 
the  time  of  divergence  of  different  lineages  of  Ab  alleles. 
Recently,  Pozzo  and  his  coworkers  (1990)  utilized  the  presence 
of  an  Alu  repeat  in  the  5  •  flanking  region  of  DQ  genes  to 
infer  the  phylogenetic  relationship  of  DQAl  and  D0A2 .  It  is 
generally  accepted  that  transposition  of  repetitive  elements 
is  a  demonstrated  fact  over  evolutionary  times.  Yet  it  is 
very  difficult,  in  higher  eukaryotes,  to  demonstrate  the 
transposition  of  a  family  repeat  in  contemporary  populations. 
The  insertion  of  Alu-like  repeats  has  been  shown  to  result  in 
intraspecies  polymorphisms  within  the  genus  Mus  (Kominami  et 
al.  1983)  and  Rattus  (Schuler  et  al.  1983) .  Consistent  with 
these  observations  is  the  finding  that  lineage  2B  allele, 
distinguished  from  lineage  2A  by  an  additional  Bl  family 
repeat,  is  only  found  in  one  subspecies  of  M.  musculus 
complex,  in  contrast  to  lineage  2 A  which  is  found  in  three 
subspecies  of  M.  musculus  complex.  Likewise,  the  lineage  3 
alleles,  derived  from  lineage  23  alleles  by  an  539  bp 
insertional  event,  is  identified  in  two  subspecies  of  M. 
musculus  complex.  In  summary,  the  different  retroposon 
inserts  have  created  both  intra-  and  inter-species 
polymorphisms . 
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Retroposons  have  been  found  in  organisms  as  diverse  as 
bacteria  and  humans.  These  observations  have  supported  the 
view  that  they  are  a  major  evolutionary  force  contributing  to 
sequence  duplications,  dispersions  and  rearrangements  that 
maintain  the  fluidity  of  eukaryotic  genomes.  Because 
retroposons  have  generated  many  families  of  pseudogenes  and 
transposable  elements  that  impose  no  apparent  advantage  to  the 
host,  it  has  been  proposed  that  nonviral  retroposons  could  be 
thought  of  as  "selfish  DNA"  that  infest  that  the  genome  but 
barely  confer  a  selective  advantage  on  host  (Orgel  &  Crick 
1980;  Doolittle  &  Sapienza  1980). 

539  bp  Retroposon:  a  Newly  Arisen  Repetitive  Family 

DNA  sequence  analysis  of  this  539  bp  repeat  revealed  that 
it  is  composed  of  a  core  element  of  235  nucleotides,  bounded 
by  two  flanking  Bl  family  repeats.  A  search  of  GenBank  with 
the  sequence  of  the  core  element  revealed  no  homology  with 
known  sequences,  suggesting  that  it  is  unique.  Blot 
hybridization  experiment  using  the  sequence  of  core  element 
as  a  probe  has  confirmed  this  observation.  Taken  together, 
these  data  indicated  that  this  539  bp  repeat  transposed 
recently  in  the  evolution  of  Ab  genes.  This  is  consistent 
with  the  fact  that  the  lineage  3  alleles  containing  this 
repeat  are  found  exclusively  in  M.  m.  domesticus  and  M.  m. 
musculus.  The  molecular  mechansims  leading  to  the  dispersal 
of  this  type  of  retroposon  is  unclear.      Although  the  core 
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element  does  not  contain  the  putative  RNA  polymerase  III 
promoter,  presumably,  the  internal  RNA  polymerase  III 
promotors  contained  within  both  Bl  repetitive  elements  would 
cotranscribe  adjacent  sequence  (Rogers  1985)  and  therefore 
spread  through  the  genome  via  RNA-mediated  transposition 
( Jagadeeswaran  et  al.   1981) . 

Transposition  of  Middle  Repetitive  Elements 

Preferential  Site  of  Integration 

The  close  similarities  in  the  structure  of  SINE  elements 
suggest  that  they  are  spread  throughout  the  genome  by  a  common 
mechanism  (Schmid  &  Shen  1986) .  The  majority  of  these  SINEs 
have  a  precisely  defined  5'  terminus  and  a  variable  oligo  dA- 
rich  3'  terminus,  flanked  by  terminal  direct  repeats.  It  has 
been  shown  that  the  5 '  end  of  the  direct  repeats  is  abundant 
in  dA  residues.  Similarly,  the  5'  flanking  region  adjacent 
to  the  5'  direct  repeat  is  strongly  biased  for  d(A+T)-rich 
sequences.  Thus,  it  was  concluded  that  regions  of  the  genome 
that  are  rich  in  d(A+T)  residues  are  likely  to  be  preferred 
integration  sites  (Daniels  &  Deininger  1985) .  In  keeping  with 
this  finding,  the  Bl  repeat  found  in  both  lineage  2B  and 
lineage  3  alleles  is  found  to  insert  into  the  region  rich  in 
dA  residues  (Figure  4-6) . 

In  fact,  there  are  numerous  examples  of  SINEs  integrating 
adjacent   to   each   other,    sharing   a   set   of   direct  repeats, 
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indicating  that  they  might  transpose  as  a  single  unit  (Roger 
1985)  .  What  is  interesting  is  that  the  539  bp  repetitive 
elements  identified  in  intron  2  of  lineage  3  alleles  is 
integrated  into  a  Bl  family  repeat  of  lineage  2A  alleles. 
This  is  consistent  with  observations  made  by  others  (Roger 
1985) . 

Possible  Transposition  Mechanism 

Structural  analysis  of  the  539  bp  repetitive  element 
reveals  that  it  is  composed  of  two  Alu-like  elements  plus  some 
unique  DNA  sequences  in  between.  This  structural  feature 
strongly  suggests  that  this  combined  unit  may  transpose  in  a 
manner  suggested  above.  The  transposition  mechanism  involved 
the  transcription  of  sequence  into  RNA.  This  RNA  transcript 
is  initiated  from  the  5*  Alu-like  repeat  by  the  internal  RNA 
polymerase  III  promoter.  Termination  occurs  at  some  point  3' 
to  the  second  Alu-like  sequence  as  Alu-like  repeat  does  not 
contain  termination  sequence  of  transcription.  The  obvious 
non-repetitive  sequence  bound  by  the  flanking  5 '  and  3 '  Alu- 
like repeats  may  have  been  cotranscribed  into  an  RNA 
transposition  intermediate  by  readthrough  synthesis  from  the 
adjacent  Alu-like  repeat  promoter.  The  RNA  molecule  thus  made 
can  be  converted  into  DNA  by  reverse  transcriptase  .  The  cDNA 
consisting  of  two  Alu-like  repeats  flanking  a  non-repetitive 
internal  fragment  could  then  be  inserted  into  a  novel  genomic 
location.     Although  both  the  Alu-like  elements  involved  are 
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not  flanked  by  terminal  direct  repeats,  yet  it  is  not  uncommon 
to  find  Alu  family  member  without  direct  repeats.  It  is  worth 
mentioning  that  this  new  repetitive  element  is  integrated  into 
a  Bl  family  repeat. 

Phyloqenetic  Relationship  of  Ab  Genes 

To  analyze  the  distribution  of  various  mutational  events 
in  the  evolutionary  history  of  the  86  Ab  alleles  in  our 
collection,  we  have  conducted  phylogenetic  analysis  by 
parsimony  analysis.  A  remarkable  feature  of  the  Ab 
phylogenetic  tree  is  that  its  main  branches  correspond  very 
closely  to  the  4  evolutionary  lineages,  1,  2A,  2B  and  3 
defined  both  by  sequence  analysis  and  restriction  mapping. 
It  is  noteworthy  that  the  phylogenetic  tree  (gene  tree) 
constructed  from  Ab  gene  locus  does  not  agree  with  the 
phylogenetic  relationship  of  the  species  involved  (species 
tree)  .  One  of  the  predominant  factors  that  lead  to  such  a 
difference  is  the  genetic  polymorphism  in  the  ancestral 
species  as  indicated  by  Pamilo  &  Nei  (1988)  .  The  results  from 
Figure  5-4  demonstrated  that  all  86  Ab  alleles  we  analyzed 
can  be  grouped  into  at  least  three  major  clusters  of  alleles, 
which  correspond  to  three  evolutionary  lineages,  1,  2A,  23  and 
3  defined  previously.  Moreover,  each  cluster  of  alleles  is 
composed  of  alleles  derived  from  different  species  and 
subspecies  of  genus  Mus,  supporting  the  idea  that  Mhc  alleles 
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evolve  in  a  trans-species  manner.     Trans-specific  evolution, 
the  occurrence  of  polymorphisms  predates  the  origin  of  the 
species,    have    been    proposed    as    the    explanation    for  the 
existence  of  identical  alleles  in  multiple  subspecies. 

The  distribution  patterns  of  Ab  alleles  in  the 
phylogenetic  analysis  suggest  that  alleles  harboring 
transposable  elements  are  not  subjected  to  deleterious 
selection.  The  number  of  alleles  related  by  descent  keep 
proliferating  as  evidenced  by  the  clustering  of  alleles  within 
each  lineage.  This  finding  is  in  direct  contrast  to  a 
neutrality  model  suggested  by  Golding  et  al  (1986)  that 
haplotypes  carrying  the  transposable  elements  are  selectively 
deleterious  as  they  are  located  at  the  tips  of  phylogenetic 
trees.  However,  a  quantitative  population  genetic  model 
proposed  by  Hickey  (1982)  suggested  that  the  spread  of 
transposable  genetic  elements  in  natural  populations  depends 
on  sexual  reproduction  of  the  host.  These  self-replicative 
transposable  elements  do  not  have  to  be  selectively  neutral 
at  the  organismal  level;  they  can  generate  major  deleterious 
effects  on  the  host  and  still  spread  through  the  population. 

This  analysis  has  allowed  us  to  construct  an  evolutionary 
trees  whereby  the  different  alleles  currently  distributed 
throughout  the  natural  populations  are  generated  by  stepwise 
divergence  from  various  lineage  progenitor  alleles.  The 
presence  of  different  sizes  of  retroposon  insertions  in  the 
intron  2  between  the  A„.  and  A^,  exons  of  Ab  alleles  has  served 
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as  evolutionary  tags  in  deciphering  the  phylogenetic 
relationships  of  these  alleles. 

The  restriction  map  data,  DNA  sequence  analysis,  and 
phylogenetic  analysis  are  consistent  with  the  idea  that  the 
Mhc  class  II  genes  are  evolving  in  a  trans-species  mode.  Each 
lineage  of  Ab  gene,  consisting  of  alleles  closely  related  to 
each  other,  are  composed  of  alleles  belonging  to  different 
species  and  subspecies  of  genus  Mus. 
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